Design FAQ

Frequently Asked Questions - Experimental Design

The major tradeoff in planning your experimental design, from a statistical standpoint, is the fundamental one between efficiency and power. In the context of fMRI, power is the conventional statistical concept of how likely it is that your experiment will correctly reject the null hypothesis of no activation; it might be thought of at how good your experiment is at detecting any sort of activation at all. Efficiency, by contrast, is the ability to accurately estimate the shape of the hemodynamic response to stimuli - the variability that your average detectable hemodynamic response has. This is clearly important if you're interested in looking at response timecourse or HRF shape, but also important in finding activations at all - if variability in your modeled response is high, it's more difficult to distinguish one condition from another.

The tradeoff between power and efficiency is, unfortunately, inescapable (shown by Liu et. al in DesignPapers) - you can't be optimal at both. Things that increase power include block design experiments, very high numbers of trials per condition, and increased numbers of subjects. Things that increase efficiency include designs with randomized inter-trial intervals (also called inter-stimulus intervals or ISIs) and analyzing your design with an event-related model (whether the design was blocked or not). Semi-random designs can give you a good dollop of both power and efficiency, at the cost of increased experimental length. Where you fall in designing your experiment will depend on what measures you're interested in looking at - but within the given constraints of a particular number of subjects, a reasonable experimental length, and a guess at how big an effect you'll have, there are good steps you can take to optimize your design.

Experimental design is heavily mixed in with setting your scanning parameters, and jittering your trial sequence, so be sure to check out the other design-related pages:

1. What are some pros and cons of block designs?

Pros: High power, lower number of trials and subjects needed. Cons: Low efficiency, high predictability (which may be a problem for certain tasks from a psychological perspective).

2. What are some pros and cons of event-related designs?

Pros: High efficiency even at lower trial numbers, can have randomized stimulus types and ISIs. Cons: Low power, more trials/subjects needed, more difficult to design - efficiency advantages require jitter (see JitterFaq) or randomized ISIs.

3. What's the difference between long and rapid event-related designs? What's good and bad about each?

Long event-related designs have long ISIs - usually long enough to allow the theoretical HRF to return to baseline (i.e., 20 or 30 sec). Rapid event-related designs have short ISIs, on the order of a few seconds. Long event-related designs have generally fallen out of favor in the last few years, as proper randomization of ISI allows rapid designs to have much greater efficiency and greater power than long. Until the very late 1990s, it wasn't entirely clear that rapid event-related designs would work from a physiological perspective - that the HRFs for different trials would add roughly linearly. Since those assumptions have been (more or less) vetted, the only advantage offered by long event-related designs is that they're much more straightforward to analyze, and that rarely outweighs the tremendous advantages in efficiency and power offered by the increased trial numbers of rapid designs.

4. What purpose would a mixed (block and event-related) design serve? Under what circumstances would I want to use one? How do I best design it?

Mixed designs, which can include both block and event-related periods, or semi-random designs which have blocks of relatively higher and lower odds of getting a particular trial type, can give you good power and efficiency, but at the cost of longer experiments (i.e., more trials). They're more predictable than fully randomized experiments, which may be a problem for certain tasks. AFNI, SPM and Tom Liu's toolbox all have good utilities to design semi-random stimulus trains - see DesignHowTos.

5. How long should a block be?

From a purely theoretical standpoint, as described by Liu and others, blocks should be as long as possible in order to maximize power. The power advantage of a block comes from summing the HRFs into as large a response as possible, and so the highest-power experiment would be a one-block design - all the trials of condition in a row, followed by all the trials of the next condition. The noise profile of fMRI, however, means that such designs are terribly impractical - at least one and probably two alternations are needed to effectively differentiate noise like low-frequency drifts from the signal of your response. So from a theoretical standpoint, Liu recommends a two- or three-block design (with two conditions, two blocks: on/off/on/off, with three conditions, two blocks: A/B/C/A/B/C, etc.). With few conditions, this can mean blocks can be quite long.

In practice, real fMRI noise means that two or three-block designs may have blocks that are too long to be optimal. Skudlarksi et. al (see DesignPapers), using real fMRI noise and simulated signal, recommend about 18 seconds for complex cognitive tasks where the response time (and time of initial hemodynamic response onset) is somewhat uncertain (on the order of a couple seconds). For simple sensory or motor tasks with less uncertainty in that response, shorter blocks (12 seconds or so) may be appropriate. Of course, you should always take into account the psychological load of your blocks; with especially long blocks, the qualitative experience may change due to fatigue or other factors, which would influence your results.

Bledowski et al. (2006, HrfPapers), using empirically derived estimates of the HRF, mathematically derive a 7-sec-on, 7-sec-off block pattern as being optimal for maximizing BOLD response, suggesting it's a bit like a "swing" - pushing for the first half, then letting go, maximizes your amplitude.

6. How many trials should one block have?

As many as you can fit in to that time. The more trials the better.

7. How many trials per condition are enough?

In terms of power, you can't have too many (probably). The power benefits of increasing number of trials per condition continue increasing until at least 100 or 150 trials per condition (see Desmond & Glover and Huettel & McCarthy in DesignPapers). In terms of efficiency, 25 or more is probably enough to get a good estimate of your HRF shape.

8. How can I estimate the power of my study before I run it?

Several of the papers below have detailed mathematical models for trying to figure that sort of thing out; if you can make an educated guess at how large (in % signal change) your effect might be from the literature, Desmond & Glover (in DesignPapers) can give you a decent range of estimation.

9. What's the deal with jitter? What does it mean? Should I be doing it?

Jitter probably deserves its own FAQ, so check out JitterFaq for more info about it...

10. Do I have to have the same number of trials in all my conditions?

This question comes up especially for subsequent memory analyses, or things like it, where subjects might have only remembered a fraction of the trials they've looked at, but have forgotten a whole lot. If you're trying to compare remembered against forgotten in that case, is that okay? Depends on exactly the ratio. First and foremost, if a given condition has too few trials in general, you'll lose a lot of ability to detect activation in it - as above, if you don't have at least 25 trials in a condition in an event-related study (over the whole experiment), you're probably starting to get on thin ice in terms of drawing inferences between conditions. But the ratio of trial numbers between conditions can also have an influence. Generally, neuroimaging programs assume that the different columns of the design matrix you're comparing have equal variance, and a vast difference in numbers between them will invalidate that assumption. In practice, this probably isn't a huge concern until you're dealing with ratios of 5 or 10 to 1. If you have 35 trials in one condition and 100 in another - it's not ideal, but you probably won't be too fouled up. If you have 30 in one and 300 in another... it's probably cause for some concern.

11. How many subjects should I run? How many do I need?

Short answer: 20-25 subjects is a good rule of thumb. Long answer: Obviously this is affected to some degree by situations like funding, etc. But from a statistical perspective, this question boils down to what the levels of noise in fMRI are, or a power analysis: how many subjects should you have in order to detect a reasonably-sized effect a reasonable amount of the time? Using moderate estimates of effect sizes (0.5%) and estimating within- and between-subject noise from real data, Desmond & Glover (2002; DesignPapers) calculated that 20-25 subjects were needed for 80% power (i.e., chance of detecting a real effect) with about the loosest reasonable whole-brain p-threshold. Smaller effect sizes might require more subjects for the same power, and looser p-thresholds (i.e., for an a priori anatomical hypothesis) might require fewer subjects. But in general, the 20-25 subject barrier is a pretty good rule of thumb. You aren't ever hurt by more subjects than that (although very large sample sizes can start tongues wagging about how small your effect size is, and you don't want to get into a fight about size - we're adults, after all). But unless you're very sure your effect size is much bigger than average, having fewer than 20-25 subjects means you're likely to be missing real effects. Check out Desmond & Glover (DesignPapers) for detailed analysis.


CategoryFaq