Basic Statistical Modeling FAQ

Frequently Asked Questions - Basic Statistical Modeling

1. What is "estimating a model?" How do the various programs do it?

Once you've performed all the spatial preprocessing you like on your functional data, you're ready to test your hypotheses about the data. Most standard analysis pathways in fMRI proceed in a hypothesis-driven fashion based on the general linear model (GLM), in which the researcher sets up a model of what she believes the brain may be doing in response to the variations in some parameter of the experiment, and then tests the truth of that hypothesis. (This contrasts with non-model-driven approaches like principal components analysis (PCA), which we'll talk about later). Estimating your model is the core statistical step of this process: the researcher describes some model of brain activity, and the program calculates a giant multiple regression of some kind to find out the extent to which that model correctly accounts for the real data, at every voxel in the brain.

Different programs have different methods of setting up a design matrix, but they all share certain elements: the user describes a set of different experimental conditions (or effects), and describes, for each of them, start times and end times (onset times and offset times, or durations). Experiments can have massively varying designs, from the simplest on-off block design to multi-condition randomly-timed event-related design - check out the sections on experimental design for more on this. The basic hypothesis is that some voxels in the brain had their intensity values covary, to a statistically significant degree, with some combination of the experimental conditions and parameters. The design matrix consists of a matrix with a row for each timepoint in the experiment (each functional image), and a column for each modeled experimental effect.

Usually, the user will then modify the design matrix to make it a more accurate model of what brain activity might be. Oftentimes, a constant term is added to the matrix, to account for the mean value of the session; sometimes linear or polynomial drifts are added to the matrix as well. Sometimes the columns of the matrix are convolved with some model of the hemodynamic response function, to reflect the blurring in signal the HRF applies to neural activity. (Another option is to simply separate the various timepoints for the response to a given condition into different columns, estimating each separately - a finite impulse response (FIR) model that effectively deconvolves the contribution of the HRF.) (See HrfFaq for more info.)

Once the design matrix is set up, the program uses the methods of GLM theory - essentially multiple regression - to calculate how accurately the model described by the design matrix accounts for the real data. The standard GLM equation is Y = BX + E, where Y is the time-varying intensities from one voxel, X is the design matrix, E is an error term, and B is the "parameters" or "beta weights" - a vector of values, one for each experimental condition, that tells the researcher how big the effect of the corresponding condition was in explaining the values at that voxel. If condition A's beta weight is significantly greater than condition B's beta weight at a given voxel, the hypothesis that A had a greater effect than B at that voxel is confirmed. Generally, programs create some voxel-by-voxel image of the beta weights - a beta image or parameter image.

Once the parameters are estimated, the program has both a measure of effect size and of error in the model for each voxel. Generally, the program then normalizes each effect size by the error to calculate some measure of statistical significance for effects - a contrast image (see ContrastsFaq for more info). The estimation, depending on the program used, the complexity of the model, and the number of images, can take a few seconds or several hours. Every major program, though, uses essentially the same methods of regression to estimate the betas, usually based on taking a pseudoinverse of the design matrix (see the Holmes et. al below for more details).

2. When should global mean scaling be used? What does it do?

Nutshell answer - global mean scaling should be used for PET, but not for fMRI.

Longer answer: One problem in neuroimaging experiments is that you're generally trying to pick out some signal from a noisy timeseries at every voxel. One form that noise can take is a global shift in intensities across the whole brain, which can be caused by scanner thermal noise, subject movement, physiological effects, etc. One way to get rid of a whole bunch of those noise sources at once, then, would be to look for timepoints where every voxel in the brain shows the same sudden shift and infer that that's a change in global response, not a regional change, and therefore not of interest to you. A simple way of doing that is just by finding the global mean of every timepoint - a global mean timeseries - and dividing every voxel's timeseries by the global mean timeseries.

The obvious problem with this is that removing the effect of the global mean from your model means you also remove signal that covaries with the global mean. In PET, this wasn't a big deal - the changes in the global mean could be on a different order of magnitude from task-related changes, and so regional activations weren't likely to bias the global mean particularly. In fMRI, though, it's a problem. Global mean intensity shifts are often about the same size, or at least comparable, to the size of task-induced activations. So large activations can seriously bias the global mean calculation, such that the global mean will go significantly up with large activations. Removing the effect of the global mean will then remove those large activations as well.

Using global scaling has been tested fairly extensively now in fMRI, and it almost always seems to negatively affect the sensitivity of the analysis. Generally, it's a bad idea.

3. What is autocorrelation correction? Should I do it?

The GLM approach suggested above and used for many types of experiments (not just neuroimaging) has a problem when applied to fMRI. It assumes that each observation is independent of the next, and any noise present at one timepoint is uncorrelated from the noise at the next timepoint. On that assumption, calculating the degrees of freedom in the data is easy - it's just the number of rows in the design matrix (number of TRs) minus the number of columns (number of effects), which makes calculating statistical significance for any beta value easy as well.

The trouble with this assumption is that it's wrong for fMRI. The large bulk of the noise present in the fMRI signal is low-frequency noise, which is highly correlated from one timepoint to the next. From a spectral analysis point of view, the power spectrum of the noise isn't flat by a long shot - it's highly skewed to the low frequency. In other words, there is a high degree of autocorrelation in fMRI data - the value at each time point is significantly explained by the value at the timepoints before or after. This is a problem for estimating statistical significance, because it means that our naive calculation of degrees of freedom is wrong - there are fewer degrees of freedom in real life than if every timepoint were independent, because of the high level of correlation between time points. Timepoints don't vary completely freely - they are explained by the previous timepoints. So our effective degrees of freedom is smaller than our earlier guess - but in order to calculate how significant any beta value is, we need to know how much smaller. How can we do that?

Friston and Worsley made an early attempt at this in the papers below. They argued that one way to account for the unknown autocorrelation was to essentially wash it out by applying their own, known, autocorrelation - temporally smoothing (or low-pass filtering) the data. The papers below extend the GLM framework to incorporate a known autocorrelation function and correctly calculate effective degrees of freedom for temporally smoothed data. This approach is sometimes called "coloring" the data - since uncorrelated noise is called "white" noise, this smoothing essentially "colors" the noise by rendering it less white. The idea is that after coloring, you know what color you've imposed, and so you can figure out exactly how to account for the color.

SPM99 (and earlier) offer two forms of accounting for the autocorrelation - low-pass filtering and autocorrelation estimation (AR(1) model). The autocorrelation estimation corresponds more with pre-whitening (see below), although it's implemented badly in SPM99 and probably shouldn't be used. In practice, however, low-pass filtering seems to be a failure. Tests of real data have repeatedly shown that temporal smoothing of the data seems to hurt analysis sensitivity more than it helps, and harm false-positive rates more than it helps. The bias in fMRI noise is simply so significant that it can't be swamped without accounting for it. In real life, the proper theoretical approach seems to be pre-whitening, and low-pass filtering has been removed from SPM2 and continues to not be available in other major packages. (See TemporalFilteringFaq for more info.)

4. What is pre-whitening? How does it help?

The other approach to dealing with autocorrelation in the fMRI noise power spectrum (see above), instead of 'coloring' the noise, is to 'whiten' it. If the GLM assumes white noise, the argument runs, let's make the noise we really have into white noise. This is generally how correlated noise is dealt with in the GLM literature, and it can be shown whitening the noise gives the most unbiased parameter estimates possible. The way to do this is simply by running a regreession on your data to find the extent of the autocorrelation. If you can figure out how much each timepoint's value is biased by the one before it, you can remove the effect of that previous timepoint, and that way only leave the 'white' part of the noise.

In theory, this can be very tricky, because one doesn't actually know how many previous timepoints are influencing the current timepoint's value. Essentially, one is trying to model the noise, without having precise estimates of where the noise is coming from. In practice, however, enough work has been done on figuring out the sources of fMRI noise to have a fairly good model of what it looks like, and an AR(1) + w model, where each noise timepoint is some white noise plus a scaling of the noise timepoint before it, seems to be a good fit (it's also described as a 1/f model). This pre-whitening is available in SPM2 and BrainVoyager natively and can be applied to AFNI (I think). This procedure essentially estimates the level of autocorrelation (or 'color') in the noise, and removes it from the timeseries ('whitening' the noise).

Theoretically, it should work well, but as its adoption is relatively new to the field, few rigorous tests of the effectiveness of pre-whitening have been done. We'll keep you posted as more info arrives...

5. How does parametric modulation work? When would I use it?

As described above, there are all kind of modifications the researcher can make to her design matrix once she's described the basics of when her conditions are happening. One important one is parametric modulation, which can be used in a case where an experimental condition is not just ON or OFF, but can happen at a variety of levels during the experiment. An example might be an n-back memory task, where on each trial the subject is asked to remember what letter happened n trials before, where n is varied from trial to trial. One hypothesis the research might have is that activity in the brain varies as a function of n - remembering back 3 trials is harder than remembering 1, so you might expect activity on a 3-back trial to be higher than on a 1-back. In this case, a parametric modulation of the design matrix would be perfect.

Generally, a parametric modulation is useful if you have some numerical value for each trial that you'd like to model. This contrasts with having a numerical value to model at each timepoint, which would be a time for a user-specified regressor (see below). In the parametric case, the user specifies onset times for the condition, and then specifies a parameter value for each trial in the condition - if there are 10 n-back trials, the user specifies 10 parameter values. The design matrix then modulates the activity in that column for each trial by some function of the parameter - linear, exponential, polynomial, etc. - set by the user. If the hypothesis is correct, that modulated column will fit the activity significantly better than an unmodulated effect. In SPM (and possibly others), the program splits the effect into two columns - an unmodulated effect and a parametric column - so that the researcher can separately estimate the effect of the condition itself and of the parametric modulation of that effect. This would be a way of separating out, in the example, the effect of doing any kind of retrieval from the load effect of varying the parameter.

6. What's the best way to include reaction times in my model? If you have events for which participants' response times vary widely (or even a little), your model will be improved by accounting for this variation (rather than assuming all events take identical time, as in the normal model). A common way of including reaction times is to use a parametric modulator, with the reaction time for each trial included as the parameter. In the most common way of doing this, the height of the HRF will be thus modulated by the reaction time. Grinband et al. (HBM06) showed this method actually doesn't work as well as a different kind of parametric regression - in which each event is modeled as an epoch (i.e., a boxcar) of variable duration, convolved with a standard HRF.

In other words, rather than assuming that neural events all take the same time, and the HRF they're convolved by varies in height with reaction time (not very plausible, or, it turns out, efficient), the best way is to assume the underlying neural events vary in reaction time, and convolve those boxcars (rather than "stick functions") with the same HRF.

In either case, as with most parametric modulation, the regressor including reaction time effects can be separate from the "trial regressor" that models the reaction-time-invariant effect of the trial. This corresponds to having one column in the design matrix for the condition itself (which doesn't have any reaction time effects) and a second, parametrically modulated one, which includes reaction times. If your goal is merely to get the best model possible, these don't need to be separated (only the second of the two, which includes RTs, could go in the model), but this will not allow you to separate the effect of "just being in the trial" from neural activations that vary with reaction time. To separate those effects, you need separate design matrix columns to model them. That choice depends on how interested you are in the reaction-time effect itself.

7. What kinds of user-specified regressors might I use? How do I include them?

Another modification you can make to the design matrix is simply to add columns or effects that don't correspond to some condition you want convolved with an HRF. A user-specified regressor is just some vector of numbers, one for each timepoint/functional image, that you'd like to include in the model because you believe it has some effect. If you have a numerical value for each timepoint (TR/functional image) that you'd like to model, a user-specified regressor is the way to go. This contrasts with the case of having a numerical value for each trial you'd like to model, in which case you'd use a parametric modulation (see above).

An example of a user-specified regressor might be if you have continuous self-reports of positive affect from each subject, and you'd like to see where there are voxels in the brain whose activity co-varied with that affect. You could include the positive affect regressor in your model and have a beta value estimated separately for it. Depending on what your hypothesis is about that effect, you may want to lag its values to account for the hemodynamic delay.

The user-specified regressor is a powerful tool for many types of modifications to the design matrix, but note that in many obvious cases in which you might want to separate out the contribution of a given effect of no interest - things like movement parameters, physiological variation, low-frequency confounds, etc. - programs may already have ways to deal with those things built in, in a more efficient fashion. At the very least, in any case when you include a user-specified regressor than you plan to simply ignore, you should try to ensure it doesn't covary significantly with your task and hence remove task-induced signal.