Any biologically plausible learning rule also needs to abide by the limitation that neurons can access information only from neighboring neurons; backprop may require information from more remote neurons. So “if you take backprop to the letter, it seems impossible for brains to compute,” said Bengio.
Nonetheless, Hinton and a few others immediately took up the challenge of working on biologically plausible variations of backpropagation. “The first paper arguing that brains do [something like] backpropagation is about as old as backpropagation,” said Konrad Kording, a computational neuroscientist at the University of Pennsylvania. Over the past decade or so, as the successes of artificial neural networks have led them to dominate artificial intelligence research, the efforts to find a biological equivalent for backprop have intensified.
Staying More Lifelike
Take, for example, one of the strangest solutions to the weight transport problem, courtesy of Timothy Lillicrap of Google DeepMind in London and his colleagues in 2016. Their algorithm, instead of relying on a matrix of weights recorded from the forward pass, used a matrix initialized with random values for the backward pass. Once assigned, these values never change, so no weights need to be transported for each backward pass.
To almost everyone’s surprise, the network learned. Because the forward weights used for inference are updated with each backward pass, the network still descends the gradient of the loss function, but by a different path. The forward weights slowly align themselves with the randomly selected backward weights to eventually yield the correct answers, giving the algorithm its name: feedback alignment.
“It turns out that, actually, that doesn’t work as bad as you might think it does,” said Yamins — at least for simple problems. For large-scale problems and for deeper networks with more hidden layers, feedback alignment doesn’t do as well as backprop: Because the updates to the forward weights are less accurate on each pass than they would be from truly backpropagated information, it takes much more data to train the network.
Researchers have also explored ways of matching the performance of backprop while maintaining the classic Hebbian learning requirement that neurons respond only to their local neighbors. Backprop can be thought of as one set of neurons doing the inference and another set of neurons doing the computations for updating the synaptic weights. Hinton’s idea was to work on algorithms in which each neuron was doing both sets of computations. “That was basically what Geoff’s talk was [about] in 2007,” said Bengio.
Building on Hinton’s work, Bengio’s team proposed a learning rule in 2017 that requires a neural network with recurrent connections (that is, if neuron A activates neuron B, then neuron B in turn activates neuron A). If such a network is given some input, it sets the network reverberating, as each neuron responds to the push and pull of its immediate neighbors.
Eventually, the network reaches a state in which the neurons are in equilibrium with the input and each other, and it produces an output, which can be erroneous. The algorithm then nudges the output neurons toward the desired result. This sets another signal propagating backward through the network, setting off similar dynamics. The network finds a new equilibrium.
“The beauty of the math is that if you compare these two configurations, before the nudging and after nudging, you’ve got all the information you need to find the gradient,” said Bengio. Training the network involves simply repeating this process of “equilibrium propagation” iteratively over lots of labeled data.
The constraint that neurons can learn only by reacting to their local environment also finds expression in new theories of how the brain perceives. Beren Millidge, a doctoral student at the University of Edinburgh and a visiting fellow at the University of Sussex, and his colleagues have been reconciling this new view of perception — called predictive coding — with the requirements of backpropagation. “Predictive coding, if it’s set up in a certain way, will give you a biologically plausible learning rule,” said Millidge.
Predictive coding posits that the brain is constantly making predictions about the causes of sensory inputs. The process involves hierarchical layers of neural processing. To produce a certain output, each layer has to predict the neural activity of the layer below. If the highest layer expects to see a face, it predicts the activity of the layer below that can justify this perception. The layer below makes similar predictions about what to expect from the one beneath it, and so on. The lowest layer makes predictions about actual sensory input — say, the photons falling on the retina. In this way, predictions flow from the higher layers down to the lower layers.
But errors can occur at each level of the hierarchy: differences between the prediction that a layer makes about the input it expects and the actual input. The bottommost layer adjusts its synaptic weights to minimize its error, based on the sensory information it receives. This adjustment results in an error between the newly updated lowest layer and the one above, so the higher layer has to readjust its synaptic weights to minimize its prediction error. These error signals ripple upward. The network goes back and forth, until each layer has minimized its prediction error.
Millidge has shown that, with the proper setup, predictive coding networks can converge on much the same learning gradients as backprop. “You can get really, really, really close to the backprop gradients,” he said.
However, for every backward pass that a traditional backprop algorithm makes in a deep neural network, a predictive coding network has to iterate multiple times. Whether or not this is biologically plausible depends on exactly how long this might take in a real brain. Crucially, the network has to converge on a solution before the inputs from the world outside change.
“It can’t be like, ‘I’ve got a tiger leaping at me, let me do 100 iterations back and forth, up and down my brain,’” said Millidge. Still, if some inaccuracy is acceptable, predictive coding can arrive at generally useful answers quickly, he said.
Some scientists have taken on the nitty-gritty task of building backprop-like models based on the known properties of individual neurons. Standard neurons have dendrites that collect information from the axons of other neurons. The dendrites transmit signals to the neuron’s cell body, where the signals are integrated. That may or may not result in a spike, or action potential, going out on the neuron’s axon to the dendrites of post-synaptic neurons.
But not all neurons have exactly this structure. In particular, pyramidal neurons — the most abundant type of neuron in the cortex — are distinctly different. Pyramidal neurons have a treelike structure with two distinct sets of dendrites. The trunk reaches up and branches into what are called apical dendrites. The root reaches down and branches into basal dendrites.