Breaking the shackles of reward: dopamine’s role in exploration

~7min read

The world is full of options. So how do we know which goal to pursue and how much effort to put in? And how can we keep track of our progress towards these goals? All these questions point to the action of one neurotransmitter in the brain: dopamine. Yes, dopamine has been previously linked to pleasure or “reward”. Dopamine is still widely believed to do all sorts of things such as getting people hooked on drugs or even cell phones. However, by now, it is well-established that altering dopamine in the brain does not change one’s pleasure in receiving rewards. Instead, dopamine comes into play whenever motivated behavior must be adjusted to pursue a goal.

Although the role of dopamine may sound straightforward, everyone knows that we often have more than one goal in mind. Sometimes, these goals may not be compatible forcing us to decide which goal is more important to pursue. Sure, I’d love to get more cake now, but what about my wish to keep weight off and perhaps lose a few more pounds by the end of the month? “Bad” habits seem particularly hard to break at times, even in light of the shiniest long-term prospects. But we have habits for a reason; they reduce the demand to deliberate every single action including their prospects in advance. They help us to handle the number of options in our small world more effectively. And we wouldn’t have formed them in the first place if they entailed no “reward” whatsoever. Thus, this is again a matter of conflicting goals and we need a way to resolve such conflicts in our brain.

Goal #1: understanding which modes support action control

As hard as it is for us to make up our mind at times; it is not much easier to understand how we manage to deal with this mess of entangled plans via dopamine signaling. What is known is that we use two basic control modes for actions. The first mode supports habitual actions. The execution of habitual actions is reflexive. We are confronted with a situation and repeat an action that previously led to a good outcome such as a delicious cake without even considering the current prospects. In contrast, the second mode supports goal-directed actions. The execution of goal-directed actions is deliberative. We are confronted with a situation and plan our actions in advance by considering the outcomes of potential alternatives such as the net value of eating the cake versus successfully keeping weight off. This example also illustrates that the two modes can point to conflicting options. Imagine that you enter a fine café on a Sunday afternoon. Is this situation more likely to trigger your appetite for cake or your long-term hunger for staying in shape? And what is likely the stonier road to happiness?

Photo by Oleg Ivanov on Unsplash

Although both action control modes are linked to separable computational mechanisms in the brain, they are both dependent on dopamine. The habitual mode is called “model-free” because successful actions in a given situation are directly reinforced via so-called temporal difference reward prediction errors. These prediction errors are forwarded by dopaminergic neurons and refine value expectations providing cached values for the efficient control of behavior. Notwithstanding, recent evidence suggests a broader role of these signals in indicating violations of predictions beyond signaling cached values1. In contrast, the goal-directed mode is called “model-based” because it operates on a learned model of the environment. It is learned via state prediction errors2 and can tune reward expectations based on prior knowledge about the chances of obtaining the desired rewards after acting in a particular state. Previous research has shown that model-based control is associated with dopamine as well3, 4. But since both modes are dependent on dopamine, it is not clear what happens when these modes point to conflicting routes of action and we boost dopamine in the brain via a drug.

Goal #2: understanding what dopamine really does to keep us on track

To study in what way a dopamine challenge alters action control, we conducted a pharmacological imaging study that was recently published in NeuroImage5. This study included 65 healthy participants. Their task was to identify which option was the best using a combination of model-free and model-based control modes. They completed the task twice; once after receiving L-DOPA, a dopamine precursor that is commonly used to treat Parkinson’s disease, and once after receiving a placebo.

In the task, the two control modes can be separated because model-free and model-based control modes sometimes make conflicting predictions, much like in real life. The task consists of two layers of boxes. First, participants pick one of the gray boxes. Each of the gray boxes leads to one set of colored boxes with a high probability and to the second set of colored boxes with a low probability. Only after selecting one of two colored boxes at the second level, participants can make money. Yet, they can only make money if the draw is successful. Thus, their task is to monitor which colored box has the highest chance to win momentarily as the probability to win changes over time.

Summary of the task used in our study (developed by Daw et al., 2011, Neuron): First, participants have to select one of two gray boxes. Each gray box is associated with a fixed transition probability (common vs. rare) leading to two sets of colored boxes (green or yellow). Second, participants have to pick one of the two colored boxes for a chance to win money. To maximize wins, they have to track changes in reward probabilities over time (trials).

Now, the habitual mode would dictate to repeat a choice that has led to reward—regardless of the probabilities that lead from the first unrewarded layer of boxes to the second rewarded layer of colored boxes. In contrast, the goal-directed mode could make strategic use of the probabilities by telling the participant: Look, you have been rewarded now. But you were lucky to get to these colored boxes in the first place. If you want to get back to them again, make sure to switch to the other gray box in the next trial.

Result #1: More dopamine reduces the reliance on the habitual control mode

So, what happens when we boost dopamine levels in the brain by giving L-DOPA? We observed that participants were less likely to repeat the same action at the first stage after being rewarded. This suggests a reduced reliance on the habitual, model-free control mode. In contrast, there was no effect on the goal-directed, model-based control mode.

Likewise, when we sought to explain choices using a computational model of behavior, we found that boosting dopamine made it harder to predict what participants would pick when they were confronted with the two, colored boxes. Since this choice comes after the first layer of unrewarded gray boxes, it is primarily guided by directly comparing the expected value of the colored boxes. Thus, boosting dopamine seemed to render decisions based on this value comparison less predictable. Notably, no changes in reward learning or decision-making other than the increased “unpredictability” at the second stage of the task were observed.

Result #2: Brain imaging corroborates the behavioral effect of dopamine

But what happens in the brain after boosting L-DOPA? We as neuroscientists often pose this question as we were talking about fully independent domains, which is obviously not true if the brain regulates behavior. Consequently, in our study, brain imaging corroborated the results of the behavioral analyses. We found no effect of boosting dopamine on reward prediction error signals linked to model-free or model-based control. Instead, we found that reward signals were weakened after boosting dopamine levels in the brain. More specifically, when the reward was delivered, the signal was less strong and when participants got to the next trial, the previous reward did not boost brain responses as much.

Boosting dopamine reduces the effect of reward? This may sound counterintuitive at first but consider what happens when we increase dopamine levels throughout the brain. If you win unexpectedly, the brain will release dopamine, but this release will come on top of elevated dopamine levels after L-DOPA. Thereby, the signal is reduced in its drive because it comes on top of more noise (that is dopamine unrelated to the win). In combination, this may lead to a weakened model-free control of behavior by direct reinforcement of successful actions after L-DOPA.

Result #3: Less reflexive control by reward = more freedom to explore

If more dopamine reduces direct reinforcement of behavior by reward, is that a good or a bad thing? Considering “bad” habits, it might be beneficial to loosen their grip if we want to overcome them. In other words, higher levels of dopamine may allow you to reduce thriftiness of behavior6, 7. Since high levels of dopamine over a longer period usually indicate that things are going well (i.e., a high average reward rate), you might be more willing to take the risk and explore new options.

Although our task was not optimized to study this trade-off, we analyzed response times depending on the number of wins or losses in the past five rounds. If you won five times in a row, you should feel confident to know what you should do next. This is also what we observed during placebo sessions: the better the recent stream of success, the quicker participants were to make up their mind. In contrast, after receiving L-DOPA, participants started to deliberate their decisions for a longer time when things were going well as if they started considering exploring other options. These results support a general role of dopamine levels in the control of actions beyond the two modes. It also suggests that other demands such as exploration versus exploitation and thriftiness need to be incorporated in dopamine signaling to resolve the challenges of directing actions towards multiple competing goals8.

Breaking free of reward

Collectively, our results demonstrate that dopamine is vital for the regulation of goal-directed behavior. By increasing dopamine levels in the brain for a longer period, we could reduce the direct reinforcement of successful actions by rewards. This blunted drive elicited by reward was evident in choices, response time, and brain responses after receiving rewards. Nevertheless, pursuing the right goal is not trivial and there are many open questions in the literature that remain to be resolved in future research. Whereas our study supports the idea that dopamine encodes aspects of value and thriftiness of actions, competing theories emphasize dopamine’s role in reinforcement learning as the prime mechanism for behavioral differences.

Photo by Jessica Da Rosa on Unsplash

To conclude, breaking free of the shackles of reward is easier if dopamine levels in the brain are high. High dopamine may indicate that momentum is on your side and that you can risk considering alternatives and exploring new avenues. Although repeating choices that paid off in the past is usually a good strategy, better insights into the mechanisms to break “bad” habits may help us to improve goal-directed behavior such as maintaining a healthy body weight despite the manifold temptations of today.

Further reading

1.         Gardner, M.P.H., Schoenbaum, G. & Gershman, S.J. Rethinking dopamine as generalized prediction error. Proc Biol Sci 285 (2018).

2.         Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J.P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585-595 (2010).

3.         Wunderlich, K., Smittenaar, P. & Dolan, R.J. Dopamine enhances model-based over model-free choice behavior. Neuron 75, 418-424 (2012).

4.         Deserno, L., et al. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc Natl Acad Sci U S A 112, 1595-1600 (2015).

5.         Kroemer, N.B., Lee, Y., Shakoor, P., Eppinger, B., Goschke, T. & Smolka, M.N. L-DOPA reduces model-free control of behavior by attenuating the transfer of value to action. Neuroimage 186, 113-125 (2019). View on sciencedirect orbioRxiv

6.         Beeler, J.A., Frazier, C.R. & Zhuang, X. Putting desire on a budget: Dopamine and energy expenditure, reconciling reward and resources. Front Integr Neurosci 6, 49 (2012).

7.         Beeler, J.A. Thorndike’s law 2.0: Dopamine and the regulation of thrift. Front Neurosci 6, 116 (2012).

8.         Kroemer, N.B., Burrasch, C. & Hellrung, L. To work or not to work: Neural representation of cost and benefit of instrumental action. Prog Brain Res 229, 125-157 (2016). View on sciencedirect

Do you want to know more about our research? Get in touch with us

Contact Nils

For updates: Follow neuroMADLAB on WordPress.com or twitter

Would you like to contribute to our research? Please sign up

Participate

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.