This review discusses a relatively recent meta-analysis by Means, Toyama, Murphy, and Baki. Here is the citation:
Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The effectiveness of online and blended learning: A meta-analysis of the empirical literature. Teachers College Record, 115(3), 1-47.
This is a meta-analysis of the effectiveness of online and blended education when compared to face-to-face education measured by outcomes. Researchers searched five online databases for research studies with the following criteria: the study used a QED or RCT design, the study examined online learning (not audio/etc.), and the study measured outcomes (not perceptions or opinions). Research studies were further analyzed for inclusion in the study using effect size. Researchers were interested in effect sizes because a similar study done in 2004 found that effect sizes of research studies were zero or near zero, which showed no difference between online education and face-to-face education. Researchers hypothesized that more sophisticated technology and pedagogy for online and blended education would yield a statistically significant effect size.
For each study, an effect size, upper and lower limit on a 95 percent confidence interval, z-value (test of the two-tailed null hypothesis), and number of units (participants or units of participants, such as a section or classroom of students) was collected. Unfortunately, not all studies yielded enough data to calculate an effect size, so analysts estimated some variables. For example, correlation r was not included in many of these studies, so the researchers used an estimated conservative estimate of r = 0.70 for studies where the pre-test and post-test measures were similar and r = 0.50 where the measures differed to estimate effect size. Many other adjustments were made in order to calculate a z-score and make these studies comparable (see page 17-18). Studies that provided enough data to calculate an effect size were reserved for the study.
The mean of the effect sizes was used to compare online and blended groups. The effect sizes were weighted to prevent smaller studies from having undue influence, and then the researchers computed effect size variance (Q-statistic) to determine “the extent to which variation in effect sizes was not explained by sampling error alone” (pg 16).
Positive Q-value findings suggested that something other than online/blended vs. face-to-face may influence effect sizes, so the studies were further analyzed to determine variations in the studies that might influence effect size. According to a Sloan-C framework, three categories of variables were determined to influence effect size: (a.) online instruction practices, (b.) conditions under which the study was conducted, and (c.) aspects of the study methodology itself. Moderating variables were derived for the study guided by these categories.
Results indicated that online learning was more effective than face-to-face learning where the mean effect of the 50 moderator variables from the Sloan-C framework was +0.20, p < .001, but this was true only when fully online and blended studies were considered as one group. By contrast, when fully online programs were considered separately, the comparison was not statistically significant, but the blended programs were. As well, 3 moderating variables were found to be statistically significant, confirming that factors other than whether the learning was online/blended vs. face-to-face was influential in the study outcomes.
Most researchers would probably agree with the the decision to include moderating variables. Even without knowing the Q-statistics of this dataset, most researchers of online education agree that blended learning varies tremendously. In fact, most would contend that no study can compare fully online and blended learning without moderating variables, even if it is only because the field writ-large does not yet fully agree on definitions of blended learning.
Readers should note the extensive use of adjustments made in order for these studies to be considered as comparable, so findings should be take with caution. If future studies that apply the same adjustments to all studies, the studies would be treated equally. Statistics are used to validly compare different data, and perhaps this tactic might make the data more comparable. On the other hand, it could serve only to reduce the sample size to a point where power is too low to show any difference.