Task and typical behaviour. A: Task. Each trial had the following structure: 1) 500 ms presentation of a central fixation cross; 2) 500 ms presentation of face without a mouth; 3) 100 ms presentation of long (13 mm) or short (11.5 mm) mouth inside the face; 4) participants reported whether the mouth was long or short by key-press (‘Z’ or ‘/’ on US keyboard, counterbalanced); 5) Face without mouth remained on screen until participant response. Short and long stimuli were each presented 50 times per block in pseudorandom sequence avoiding more than three repetitions in a row. Adapted from . B: Reward schedule. One response (counterbalanced across participants) had a higher reward expectation. Correct identification of that “rich” stimulus was more likely to be rewarded (75% probability) than correct identification of the other, “lean”, stimulus (30% probability). There was no punishment. If in doubt, choosing the more rewarded stimulus was beneficial. C: Surrogate simulated data showing prototypical response evolution. The dark bars show a hypothetical control group, developing a strong response bias towards the more rewarded response over the three blocks of 100 trials. The light bars show a prototypical treatment group with a reduced response bias. D-E: Surrogate simulated data generated from a simple reinforcement learning (‘Stimulus-action’) model. Both a reduction in reward sensitivity (D) and a reduction in learning rate (E) can roughly reproduce the pattern in the data (C). F: Percent correct responses for each of the 392 experimental sessions. Each black point represents one experimental session. Vertical bars demarcate datasets. Red horizontal line represents chance performance for each session. Four participants performed below chance (red). Sixty-three out of 392 experimental sessions were not fitted better than chance by model ‘Belief’ (binomial test; blue). Of these, 58 out of 63 were in the Stress dataset, in which performance was generally worst.