The layman’s guide to the new pro-mask research study

Do we now have conclusive proof that masking works?  No.  Do we have data that strongly suggests this to be the case?  Yes.  Is it all wrapped up in questions of how to interpret such studies, and the inherent difficulty of studies that can only attempt to approximate an experiment rather than truly being one?  Also yes.

Here’s the background:  a group of researchers and aid workers from Poverty Action, funded by the charity Give Well, undertook a massive study in Bangladesh to test mask-wearing — but you can’t simply force one group of people to wear masks and prohibit another group, particularly at the village-by-village level, so what they undertook were a series of actions designed to promote mask-wearing.

To begin with, they designated certain villages “control” and others “treatment” in the same way as, with a test of a medication, a certain group would get the placebo and others, the real medicine.  The control villages received, nothing, but the “treatment” village, through a process of randomization, either were given cloth masks or surgical masks, for the duration of a 10-week period, and, with further randomization, were given further inducements to wear masks, such as encouragement from imams and other “village elders,” or texts from experts encouraging mask-wearing as an altruistic action or for one’s own benefit, or other such encouragements.  They then measured the degree to which these inducements resulted in more mask-wearing, by having observers count the number of people wearing masks in public places, and found that they were able to triple the rate at which people wore their masks in public.

Now, to be clear, the entire “package” was implemented for the main test group:  free mask distribution alongside encouragement to wear the masks and role-modeling by public officials and community leaders, which included a video with the head imam and a national cricket star shown when the masks were distributed, as well as promotion by local imams during Friday prayers using a scripted speech and further unspecified “mask promotion in public spaces.”  Some villages also received monetary or non-monetary incentives for village-wide compliance, a program of asking households to commit to mask-wearing with a pledge and a front-door sign, and a set of text messages; and villages were also randomly assigned to receive either cloth or surgical masks.  After 8 weeks, they stopped the “intervention” but kept tallying mask-wearing for a further two weeks.

The result was that mask-wearing in the treatment villages increased by 29 percentage points or, using a different method of analysis, 28.1 points, relative to a baseline of 13.3%.  Surprisingly, the additional boosting efforts had no effect: none of the by-village monetary incentive, text-encouragement, or public front-door sign program made a difference – in fact, these most likely reduced levels of mask-wearing.  The only factors that were associated with greater rates of mask-wearing were being given a surgical (rather than cloth) mask and being given a mask that was blue rather than green or purple rather than red.

(Yes, really – the effect on mask-wearing of having a purple cloth mask was quite substantial.  The mask colors were used to identify people who had received different sorts of “private” nagging in terms of the text messages, but the meaning of the color used was varied between village.  Green symbolizes Islam; was this color seen as sacrilegious?  Was there a different, negative connotation to red, or positive connotation to purple?  This unexpected difference is a bit disconcerting, because it suggests that the researchers did not have the understanding of local culture that they should have to conduct research, even if there’s nothing else fishy about it.)

But this was only the first step in their study.  Their larger objective was to measure the degree to which the free-mask/mask-encouragement-induced greater mask-wearing reduced covid cases – and, indeed, they find substantially lower rates for the treatment than the control groups, based on testing everyone who reports covid symptoms during the study period.  (Two complications here:  first, they only tested those who reported symptoms, and only about 40% of those reporting symptoms agreed to be tested.)

The results here are fairly dramatic, or are so at first glance, at least: relative to the control villages, those villages given surgical masks had an 11% reduction in (symptomatic) covid prevalence over the 10 week period.  For those above 60, those at highest risk, the results are even more dramatic, a decrease in infection of 35%.  Considering that, even with these interventions, fewer than 50% of people observed wore masks, this suggests that consistent mask-wearing by everyone would have an even greater effect.  And, in fact, the authors do the math of how much it cost them to provide the masks, the mask-promotion, and the mask-wearer counting, to conclude that it is entirely feasible, in terms of lives saved, to expand these efforts.

The study also looked at the impact of mask-wearing on physical distancing – not so much because it was their goal to push Bangladeshis into more distancing but because one theory was that mask-wearing would, due to risk-compensation, result in people distancing less.  Instead, within mosques, people distanced as much as before, but in other circumstances, distancing increased.

But the study left a number of questions unanswered – or, at least, I didn’t see the answers.

We know that treatment villages were given masks and non-treatment villages were not – but the latter villages were still surveyed by phone and asked about symptoms, then those reporting covid symptoms were asked to test, which about 40% consented to.  The study did not indicate what percent of villagers responded to the survey, or how they perceived the study, or whether they resented being called and asked questions when only the neighboring village, not they themselves, received masks.

The study also did not report on any issues of variation within the treatment villages, except to the extent that standard errors are reported for mask-wearing (and, honestly, I’m not good enough at the stats part to get a sense of interpretation here).  Was there an (inverse) correlation between village-wide mask-wearing and covid prevalence?  That would make the relationship between masks and covid-reduction clearer.  Is there a reason why this statistical calculation/test would be invalid?  The villages are all also presented as simply generic “one no different than the next” villages, and maybe that’s indeed true, or the randomization process makes differences irrelevant, but I would imagine that there are still real differences, whether they be a matter of some regions of the country being richer or poor than others, or having different age pyramids (different fertility rates, different rates of out-migration to the city).

Also, all observations were conducted outside except for mosques, because there simply weren’t non-mosque indoor spaces.  But it is generally not considered particularly risky to wear masks outdoors, and the paper doesn’t state whether villagers were told to wear masks any time they were outside their own homes, or what instructions in particular they were given regarding times and circumstances in which it was necessary to wear a mask, and when the risk was low enough not to.  Or is Bangladeshi public/outdoor life as crowded as indoor American life?

Another surprising element is the two pilot studies that informed their ultimate large-scale study.  In the first study, they had free masks and an educational campaign, and boosted mask-wearing rates by 10.9 percentage points.  In the second pilot, they added the presence of workers whose role was to “remind” villagers to wear their mask, and they boosted the rate to a level matched in the final study, 28.4 percentage points.  Honestly, I have trouble making sense of this – isn’t a village in Bangladesh exactly the sort of place where outsiders would be very visibly “outside” and not able to persuade much?  Or were “locals” hired in this role?  This seems to be another “cultural” issue.  As it happens, one of the criticisms of the study is that symptoms were self-reported rather than based on objective testing, so that if the villagers in test villages believed that there was a particular reason to minimize symptoms (to prove they were compliant, to avoid dishonoring the village, to show loyalty to village elders, etc.), this would cause problems with the study, and their surprising degree of responsiveness to individual “persuaders” suggests to me that this is possible.

Another issue is the differentiation between surgical and cloth masks.  The key data element is, again, that control villages had a prevalence of covid of 0.76% cloth mask villages had a prevalence of .74%, and surgical mask villages, .67%.  There was therefore no statistically-significant effect from cloth masks – which of course should raise concerns for places such as the US where “even a bandana will do” has been the operative approach.  But in any case, there was a higher rate of mask-wearing for surgical mask villages, even though the difference wasn’t statistically-significant.  It does nonetheless raise the question of whether the surgical mask was what made the difference, or the greater likelihood of mask-wearing in surgical-mask villages.

Another issue:  age group differences.  For surgical mask villages only, they split out covid rates by age.  For those younger than age 50, there was no difference in covid infections between this villages and the control villages.  For those 50 – 60 years old, there was a decrease of 23%.  For those over age 60, there was a decrease of 35%.  What would account for this difference?  The study does not identify different mask-wearing rates for different ages (presumably they did not attempt to guess the age of the mask-wearers or non-mask-wearers they saw), and, in theory, this shouldn’t matter, as the theory of mask-wearing is that it protects others, so that the entire community should see declines.  However, the study (to prove risk compensation was not happening) showed that there were greater degrees of physical distancing in treatment villages.  Did the project of mask-wearing result in overall greater degrees of caution, especially among older Bangladeshis?

This is a point of contention among critics, as well as the general element of the increased physical distancing.  If physical distancing could be the cause of reduced spread, or if other elements explain the reduction only among the old, then did the intervention “work”?  Or, rather, what does it mean to say the intervention “worked” if it was plausibly the knock-on effects of mask-wearing and what we want to demonstrate is that masking can substitute for undesirable alternate interventions like distancing or lockdowns?

Here are some other criticisms I’m seeing.

First, from an anonymous commenter on twitter:  the difference between cloth and surgical mask-wearing isn’t statistically significant when measured with something called an “intervention prevalence ratio,” which is more-or-less the difference in rates provided above.  On this basis, a confidence interval for either cloth or surgical mask shows that there is definitely a decrease in covid prevalence, but, because of the necessary differences in standard error for the smaller sample sizes for each group individually, the confidence intervals for cloth vs. surgical individually are wider, overlap, and are not even definitively proven to be effective, with only the surgical mask being significant at the 10% level.  Even with the large number of villages recruited into the study, the overall prevalence rates were low enough so as to not definitively establish the desired conclusions.  Given the uncertainties in the study in general, you’d really like to see some slam-dunk numbers here.

Second, a substack site “bad cattitude” levies a number of criticisms.  Some of them are, I think, too nit-picky, for example, leaning very heavily into complaints that the authors did not definitively establish that the villages were truly sufficiently identical to each other for the randomization to be effective.  In particular, they did not have a starting value for covid-prevalence, just the ending point.  It seems unlikely to me that there would have been such a difference as to have invalidated the study but he says “this is a tiny signal (7 in 10,000) [so] we need a very high precision in start state” and “even miniscule variance in prior exposure would swamp this.”  It would be helpful to have seen some math demonstrating the possible effect of different levels of variance that are within statistical possibility.

This author’s larger criticism is of the self-reported nature of symptoms that I observed earlier.  Now, we already know that there are two elements of Bangladeshi village culture that are “non-WEIRD” — the fact that mask color has a statistically-significant effect on whether villagers choose to wear them, and that mask-reminders have a dramatic impact on use.  (Just try to imagine that happening in small town USA!)  The substack author also points out that there was a very wide discrepancy between self-reports of mask-wearing (80%) in their own prior survey and actual use.  It seems to me likely that Westerners cannot necessarily predict how Bangladeshi villagers would respond to being given masks, then being called and asked to self-report whether they have any of a set of symptoms, but it also seems to me that there’s a good chance that their response would not be the same as Americans, in one direction or another.  That site also quotes twitter account @Emily_Burns_V, who says, “Is it possible that highly moralistic framing and monetary incentives given to village elders for compliance might dissuade a person from reporting symptoms representing individual and collective moral failure — one that could cost the village money?  Maybe?”  And, indeed, the study’s authors say that there was no effect of the text-nagging or the incentives, on mask-wearing, but do not report whether there are differences between these groups, and the symptom-reporting.

Finally, Bad Cattitude has an interpretation of the age-differences which seems more plausible than “masking had a greater effect on the old”:  “the odds on bet here is that old people were more inclined to please the researchers than young people and that they failed to report symptoms as a result.”

One last set of comments on the study, from researcher Lyman Stone, again via twitter.  He defends the study authors against the accusation that they failed to pre-test to establish a baseline, by saying that the study authors themselves acknowledged that this was still underway, and this was, after all, a working paper, not the final product, and reports that it is the norm to provide preliminary reports even when the data analysis is complete.

Stone also observes that the differences in results between cloth and surgical masks is an indicator that there was a sort of unplanned “blindness” to the study, in that both the cloth and surgical mask recipients were aware they were a part of a study, so if the effects we see are a result of their response to this, we’d see the same effects for both cloth and surgical — but we don’t.  (Of course, Stats with Cats observes that the difference between the two groups is not a slam dunk because of confidence intervals, and, as tempting as it may be to do otherwise, it is important to take the confidence intervals seriously.)  (For what it’s worth, Bad Cattitude rebuts the rebuttal in a follow-up piece.)

The bottom line:  when this study first came across my twitter feed, I enthusiastically retweeted it.  Now I’m disappointed — I would have really liked to have seen more answers, and be left with fewer questions that mean it becomes “one data point among many” rather than the slam-dunk evidence that some of its promoters think it is, especially since the whole debate has now resulted in mask-promoters asserting that mask-wearing is always and everywhere cost-free while ignoring that for some people it creates real health issues and for children, poses risks of developmental delay.

Finally, as a reminder for those who don’t know my background, very early in the pandemic I was not merely an enthusiastic mask-wearer, but a die-hard mask-maker, donating some 150 of them to healthcare workers and others, which means that anyone who judges these comments as those of a crazy anti-masker wholly misunderstands them.

coronavirus