Percent Signal Change FAQ

Frequently Asked Questions - Percent Signal Change

Check out RoisFaq for more info about region-of-interest analysis in general...

1. What’s the point of looking at percent signal change? When is it helpful to do that?

The original statistical analyses of functional MRI data, going way back to '93 or so, were based exclusively on intensity changes. It was clear from the beginning of fMRI studies that raw intensity numbers wouldn't be directly comparable across scanners or subjects or even sessions - average means of each of those things varies widely and arbitrarily. But simply looking at how much the intensity in a given voxel or region jumped in one condition relative to some baseline seemed like a good way to look at how big the effect of the condition was. So early block experiments relied on averaging intensity values for a given voxel in the experimental blocks, doing the same for the baseline block, and comparing the two of 'em. Relatively quickly, fancier forms of analysis became available, and it seemed obvious that correcting that effect size by its variance was a more sensitive analysis than looking at it raw - and so t-statistics came into use, and the general linear model, and so forth.

So why go back to percent signal change? For block experiments, there are a couple reasons, but basically percent signal change serves the same function as beta weights might (see RoisFaq for more on them): a numerical measure of the effect size. Percent signal change is a lot more intuitive a concept than parameter weights are, which is nice, and many people feel that looking at a raw percent signal change can get you closer to the data than looking at some statistical measure filtered through many layers of temporal preprocessing and statistical evaluation.

For event-related experiments, though, there's a more obvious advantage: time-locked averaging. Analyzing data in terms of single events allows you to create the timecourse of the average response to a single event in a given voxel over the whole experiment - and timecourses can potentially tell you something completely different than beta weights or contrasts can. The standard general linear model approach to activation assumes a shape for the hemodynamic response, and tests to see how well the data fit that model, but using percent signal change as a measure lets you actually go and see the shape of the HRF for given conditions. This can potentially give you all kinds of new information. Two voxels might both be identified as "active" by the GLM analysis, but one might have an onset two seconds before the next. Or one might have a tall, skinny HRF and one might have a short but wide HRF. That sort of information may shed new light on what sort of processing different areas are engaging in. Percent signal change timecourses in general also allow you to validate your assumptions about the HRF, correlate timecourses from one region with those from another, etc. And, of course, the same argument about percent signal change being somehow "closer" to the data still applies.

Timecourses are rarely calculated for block-related experiments, as it's not always clear what you'd expect to see, but for event-related experiments, they're fast becoming an essential element of a study.

2. How do I find it?

Good question, and very platform dependent. In AFNI and BrainVoyager, whole-experiment timecourses are easily found by clicking around, and in the Gablab the same is available for SPM with the Timeseries Explorer. Peristimulus timecourses, though, ususally require some calculation. In SPM, you can get fitted responses through the usual results panel, using the plot command, but those are in arbitrary units and often heavily smoothed relative to the real data. The simplest way these days for SPM99 is to use the Gablab Toolbox's roi_percent code. Check out RoiPercent for info about that function. That creates timecourses averaged over an ROI for every condition in your experiment, with a variety of temporal preprocessing and baseline options. In SPM2, the new Gablab roi_deconvolve is sort of working, although it's going to be heavily updated in coming months. It's based off AFNI's 3dDeconvolve function, which is the newest way to get peristimulus timecourses in AFNI. That's based on a finite impulse response (FIR) model (more on those below). BrainVoyager's ROI calculations will also automatically run an FIR model across the ROI for you.

3. How do those timecourse programs work?

The simplest way to find percent signal change is perfectly good for some types of experiments. The basic steps are as follows:

  • Extract a timecourse for the whole experiment for your given voxel (or extract the average timecourse for a region).
  • Choose a baseline (more on that below) that you'll be measuring percent signal change from. Popular choices are "the mean of the whole timecourse" or "the mean of the baseline condition."
  • Divide every timepoint's intensity value by the baseline, multiply by 100, and subtract 100, to give you a whole-experiment timecourse in percent signal change.
  • For each condition C, start at the onset of each C trial. Average the percent signal change values for all the onsets of C trials together.
  • Do the same thing for the timepoint after the onset of each C trial, e.g., average together the onset + 1 timepoint for all C trials.
  • Repeat for each timepoint out from the onset of the trial, out to around 30 seconds or however long an HRF you want to look at.

You'll end up with an average peristimulus timecourse for each condition, and even a timecourse of standard deviations/confidence intervals if you like - enough to put confidence bars on your average timecourse estimate. This is the basic method, and it's perfect for long event-related experiments - where the inter-trial interval is at least as long as the HRF you want to estimate, so every experimental timepoint is included in one and only one average timecourse.

This method breaks down, though, with short ISIs - and those are most experiments these days, since rapid event-related designs are hugely more efficient than long event-related designs. If one trial onsets before the response of the last one has faded away, then how do you know how much of the timepoint's intensity is due to the previous trial and how much due to the current trial? The simple method will result in timecourses that have the contributions of several trials (probably of different trial types) averaged in, and that's not what you want. Ideally, you'd like to be able to run trials with very short ISIs, but come up with peristimulus timecourses showing what a particular trial's response would have been had it happened in isolation. You need to be able to deconvolve the various contributions of the different trial types and separate them into their component pieces.

Fortunately, that's just what AFNI's 3dDeconvolve, BrainVoyager QX, and the Gablab's roi_deconvolve all do. SPM2 also allows it directly in model estimation, and Russ Poldrack's toolbox allows it to some degree, I believe. They all use basically the same tool - the finite impulse response model.

4. What's a finite impulse response model?

Funny you should ask. The FIR model is a modification of the standard GLM which is designed precisely to deconvolve different conditions' peristimulus timecourses from each other. The main modification from the standard GLM is that instead of having one column for each effect, you have as many columns as you want timepoints in your peristimulus timecourse. If you want a 30-second timecourse and have a 3-second TR, you'd have 10 columns for each condition. Instead of having a single model of activity over time in one column, such as a boxcar convolved with a canonical HRF, or a canonical HRF by itself, each column represents one timepoint in the peristimulus timecourse. So the first column for each condition codes for the onset of each trial; it has a single 1 at each TR that condition has a trial onset, and zeros elsewhere. The second column for each condition codes for the onset + 1 point for each trial; it has a single 1 at each TR that's right after a trial onset, and zeros elsewhere. The third column codes in the same way for the onset + 2 timepoint for each trial; it has a single 1 at each TR that's two after a trial onset, and zeros elsewhere. Each column is filled out appropriately in the same fashion.

With this very wide design matrix, one then runs a standard GLM in the multiple regression style. Given enough timepoints and a properly randomized design, the design matrix then assigns beta weights to each column in the standard way - but these beta weights each represent activity at a certain temporal point following a trial onset. So for each condition, the first column tells you the effect size at the onset of a trial, the second column tells you the effect size one TR after the onset, the third columns tells you the effect size two TRs after the onset, and so on. This clearly translates directly into a peristimulus timecourse - simply plot each column's beta weight against time for a given condition, and voila! A nice-looking timecourse.

FIR models rely crucially on the assumption that overlapping HRFs add up in linear fashion, an assumption which seems valid for most tested areas and for most inter-trial intervals down to about 1 sec or so. These timecourses can have arbitrary units if they're used to regress on regular intensity data, but if you convert your voxel timecourses into percent signal change before they're input to the FIR model, then the peristimulus timecourses you get out will be in percent signal change units. That's the tack taken by the Gablab new roi_percent. Some researchers have chosen to ignore the issue and simply report the arbitrary intensity units for their timecourses.

By default, FIR models include some kind of baseline model - usually just a constant for a given session and a linear trend. That corresponds to choosing a baseline for the percent signal change of simply the session mean (and removing any linear trend). Most deconvolution programs include the option, though, to add other columns to the baseline model, so you could choose the mean of a given condition as your baseline.

There are a lot of other issues in FIR model creation - check out the AFNI 3dDeconvolve model for the basics and more.

5. What are temporal basis function models? How do they fit in?

Basis function models are a sort of transition step, representing the continuum between the standard, canonical-HRF, GLM analysis, and the unconstrained FIR model analysis. The standard analysis assumes an exact form for the HRF you're looking for; the FIR places no constraints at all on the HRF you get. But sometimes it's nice to have some kinds of constraints, because it's possible (and often happens) that the unconstrained FIR will converge on a solution that doesn't "look" anything like an HRF. So maybe you'd like to introduce certain constraints on the type of HRFs you'll accept. You can do that by collapsing the design matrix from the FIR a little bit, so each column models a certain constrained fragment of the HRF you'd like to look for - say, a particular upslope, or a particular frequency signature. Then the beta weight from the basis function model represents the effect size of that part of the HRF, and you can multiply the fragment by the beta weight and sum all the fragments from one condition to make a nice smooth-looking (hopefully) HRF.

Basis function models are pretty endlessly complicated, and the interested reader is referred to the papers by Friston, Poline, etc. on the topic - check out the Friston et. al, "Event-related fMRI," here: ContrastsPapers.

6. How do you select a baseline for your timecourse? What are pros and cons of possible options? Do some choices make particular comparisons easier or harder?

Good question. Choosing a particular baseline places a variety of constraints on the shape of possible HRFs you'll see. The most popular option is usually to simply take the mean intensity of the whole timecourse - the session mean. The problem with that as a baseline is that you're necessitating that there'll be as much percent signal change under the baseline as over it. If activity is at its lowest point during the inter-trial interval or just before trial onset, then, that may lead to some funny effects, like the onset of a trial starting below baseline, and dramatic undershoots. As well, if you've insufficiently accounted for drifts or slow noise across your timecourse, you may overweight some parts of the session at the expense of others, depending on what shape the drift has. Alternatively, you could choose to have the mean intensity during a certain condition be the baseline. This is great if you're quite confident there's not much response happening during that condition, but if you're not, be careful. Choosing another condition as the baseline essentially calculates what the peristimulus timecourse of change is between the two conditions, and if there's more response at some voxels than you thought in the baseline condition, you may seriously underestimate real activations. Even if you pick up a real difference between them, the difference may not look anything like an HRF - it may be constant, or gradually increase over the whole 30 seconds of timecourse. If you're interested in a particular difference between two conditions, this is a great option; if you're interested in seeing the shape of one condition's HRF in isolation, it's iffier.

With long event-related experiments, one natural choice is the mean intensity in the few seconds before a trial onset - to evaluate each trial against its own local baseline. With short ISIs, though, the response from the previous trial may not have decayed enough to show a good clean HRF.

7. What kind of filtering should I do on my timecourses?

Generally, percent signal analysis is subject to the same constraints in fMRI noise as the standard GLM, and so it makes sense to apply much of the same temporal filtering to percent signal analysis. At the very least, for multi-session experiments, scaling each session to the same mean is a must, to allow different sessions to be averaged together. Linear detrending (or the inclusion of a first-order polynomial in the baseline model, for the AFNI users) is also uncontroversial and highly recommended. Above that, high-pass filtering can help remove the low-frequency noise endemic to fMRI and is highly-recommended - this would correspond to higher-order polynomials in the baseline model for AFNI, although studies have shown anything above a quadratic isn't super useful (Skudlarski et. al, TemporalFilteringPapers). Low-pass filtering can smooth out your peristimulus timecourses, but can also severely flatten out their peaks, and has fallen out of favor in standard GLM modeling; it's not recommended. Depending on your timecourse, outlier removal may make sense - trimming the extreme outliers in your timecourse that might be due to movement artifacts.

8. How can you compare time courses across ROIs? Across conditions? Across subjects? (peak amplitude? time to peak? time to baseline? area under curve?) How do I tell whether two timecourses are significantly different? How can you combine several subjects’ ROI timecourses into an average? What’s the best way?

All of these are great questions, and unfortunately, they're generally open in the literature. FIR models generally allow contrasts to be built just as in standard GLM analysis, so you can easily do t- or F-tests between particular aspects of an HRF or combinations thereof. But what aspects make sense to test? The peak value? The width? The area under the curve? Most of these questions aren't super clear, although Miezin et. al (PercentSignalChangePapers) and others have offered interesting commentary on which parameters might be the most appropriate to test. Peak amplitude is the de facto standard, but faced with questions like whether the tall/skinny HRF is "more" active than the short/fat HRF, we'll need a more sophisticated understanding to make sense of the tests.

As for group analysis of timecourses, that's another area where the literature hasn't pushed very far. A simple average of all subjects' condition A, for example, vs. all subjects' condition B may well miss a subject-by-subject effect because of differing peaks and shapes of HRFs. That simple average is certainly the most widely used method, however, and so fancier methods may need some justification. One fairly uncontroversial method might be simply analogous to the standard group analysis for regular design matrices - simply testing the distribution across subjects of the beta weight of a given peristimulus timepoint, for example, or testing a given contrast of beta weights across subjects.