Red Teaming

Aug 28, 2023

The idea of a red team as an antidote to any shortcomings of institutional planning, to challenge groupthink and unwarranted assumptions, is widely heard when outcomes are disappointing. But how this works in practice is so rarely described that the red team proposal may really about deflecting responsibility for stress testing plans to someone else. Indeed many criticisms of planning point to the unforeseen consequences of plans which are not very easy to remedy when hazards are still ephemeral and budgets for potential mitigations limited. While poor planning has been a criticism of the SARS 2 pandemic response, some research did work well and on others alternatives are fantastical.

Historically, the red team was the adversarial counterpart to the blue team, a more maverick venture to try to beat the best the institution could deliver as a strategy. But the UK Ministry of Defence (MoD) has updated its guidance to explain a shift from specific red cell operations to a red team mindset. This is partly because the adversarial approach requires status and trust in order to have the credibility necessary to improve on plans, but it also acknowledges that packaging up testing responsibility for an identified groups is unlikely to be reliably successful, especially in distributed, specialised and technical responses. What is recommended instead is a shift in attitude to consider cognitive and other biases in framing the problem and potential solutions.

For the MoD, a plan should have stages roughly: information gathering, sense making, decision taking and operational planning. Each of these can be limited in not considering alternatives and prematurely focusing on specific issues or parts of a system. So red teaming involves considering the biases known to occur in each stage and practical steps to mitigate each possibility: respectively expert peer review, key assumption testing, counterfactual scenarios and simulations of systems including interdependencies in responses. This extends to making use of data to calibrate planning by using a basic reference class, because everything is less unique than it is anchored in basic similar events in infrastructure projects, for example.

Risk Planning

The UK government takes a particular approach to risk involving assessing the likelihood and potential impact of materialising. And so plans' corresponding mitigations would be invested in depending on the scale of the combination of the two, but more recent thinking about red teams takes a different approach. If the scale of the impact is understood, then there needs to be an appreciation of the planned response proportionate to that impact, not to the likelihood of its occurring. There is the obvious problem of aleatory reasoning, that it does not matter how likely it is so much as whether it happens or not, but also a more general point that in a class of unlikely events which are independent it can be quite likely to see one.

The other challenge is that frailties are not susceptible just to one impact but many, and so an unlikely impact can be common to several different hazards. Supply chains are an obvious example which can be disrupted by extreme weather, communications failures, and other disruptions to transport infrastructure. So an approach considering systems, and known resilience across many sectors, is indicated, as is a socio-technical perspective, that impact depends on how the public responds to a crisis, in demand for services and other uses of infrastructure. And of course the duration of a crisis, lasting more than a few days is not common but has dramatically wider impacts than the scenario where a majority of the population can shelter in place and offer limited support to neighbours and other family members.

The process of scenario analysis is widely misconstrued as offering an intentional selection of predictions, whereas a portfolio of projections of possible outcomes is designed to cover the potential outcomes. These then show what scale of mitigations need to be planned, and while propagation to other systems is relevant, conceptualising what the plausible scenarios are is very difficult, not least as they have not occurred. So assumptions about the mechanism for impacts to emerge need to be calibrated and then systems stress tested in terms of responding to scale, speed and duration of impacts as they propagate through the whole of society. Thus anticipating the response of people, roles of officials delivering essential services and emergency responders can be planned, and the public experiencing the impacts on their daily life.

Legitimacy of a Red Team

External criticism can be demoralising and provoke defensive responses, and a culture of secrecy which supports dismissal of objections as lacking understanding of information unavailable to outsiders. But anyone engaging with the academic peer review process, for grants applications as well as journal publications, can recognise concerns it may not be well-informed never mind constructive in a way that improves the research. So red team composition and mandate needs to be developed for the socio-technical expertise described above, and a trusted relationship to the main responsible group. However, there is a further factor making this a bit harder in institutional planning, which is a need for public legitimacy to speak on behalf of the citizen.

Breadth of socio-technical expertise is necessary to cover the critical sense making stage already, so legitimacy ought to depend on making it apparent that this is well covered. Similarly, information gathering can be judged more directly in comprising academic expertise across the full range of relevant areas, including how these are applied by professionals in services. And of course all of these levels of service delivery experience should be represented in operational simulation exercises, using responsible professionals across the areas of impact. But there is a more difficult point in setting out how decisions are taken and more particularly the framework used, what is considered and how factors are valued when compromise is necessary to resolving conflicting objectives.

Explicit compensation schemes are established for industrial accidents, for various levels of disability caused in a misadventure. Legitimacy of such a value framework is established by quasi-judicial oversight in suitably independent tribunals, but it is a simple balance of incapacity against a payment. Averages can be used to determine things like expectancy of life and quality adjusted life years to make the calculations, but agreed and standardised data are required. Qualitatively different impacts, accruing to different types of actors, including unrealised future potential are considerably more difficult. And ultimately people value their cultural, familial, social, economic and educational prospects differently, so rather than deriving a formula, an oversight forum with enough breadth of interests is more democratic.

Pandemic Practicalities

The UK government set out a simple framework early on in the SARS 2 pandemic, about reducing deaths and protecting the vulnerable, as a reference for the developing response. Operationalising and evaluating this basic sort of framework is difficult in real time, and even in 2023 results of reviews focus on intermediate stages of transmission, not the population impact. These data can be useful for benchmarking and calibration in future planning with the caveat that they apply most well to the one virus in a particular socio-technical context. And actually a lot of the evidence for population impacts is inconclusive in areas such as border controls, or low quality as in the case of policy around face masks.

So communication in these situations is very important as the value framework is crude and intelligence to calibrate immediate response is too coarse to be very agile. Indeed, everyone knows that trust is important and this is well understood to be a problem of corporate trustworthiness, with a specific focus on the reliability of institutions under pressure. Good evidence is available that confident expressions of uncertainty will be trusted, coupled with a commitment to actions to resolve the limitations (which can represent a hostage to fortune). But more than that, expressions of reasons for trusting institutions are better in the medium term than authoritarian edicts about fear and coercive actions. If the population needs to follow instructions for an extended period, early attempts at control served to undermine trust.

Individual disclosures were the main mode of criticism of pandemic response policy, from the planning to the operations, and although these could be informed they were rarely effective. Many people said that the impact of policy should be evaluated, to learn in the moment, to share international experience and as a basis for future plans, but none of this was achieved. Some criticism was poorly described, interpreted as political or based on out of date information, but largely the people who were busy working on responses did not have the time to engage with criticism. This gap in closing the loop on some matters would have focused and engaged critics in some areas and at some points, in a way akin to a red team vision. But there was no forum for that engagement, which has led to a rather confused role to be advocated for a red team in future responses, covering everything that was missed out this time.

Crimson Tidings

A pandemic is a particular kind of crisis which may involve the entire human population, and run for several years, necessitating a dynamic response and corresponding criticism. A red team can be involved in the development of plans and exercises to test them, but live in the field as an unfamiliar situation evolves the strategic role is not very clear. However, that is exactly what is described in a role that ought to cover more of social science, economics, ethics and behaviour than the technically focused science advisory body advising on evidence for specific interventions. Convening such a range of specialists who understand each other and the conceptual structure of the technical work they are asked to review is a very challenging idea but it has not been tried, although the roadmap out of lockdown had that character.

To sustain its legitimacy, it would have to make public reports (which would help with follow up) and would introduce a further burden of communication, although perhaps adding to focus on doing this better. The red team would have given more legitimacy to the challenge of managing a first wave without being clear about the expected pattern of future waves affected by immunity and seasonality. Choices about economics, which happens at a different speed and scale to health impacts, would have been possible to publish detailed evidence reviews about. Fundamentally, the role of buying time through interventions to reduce transmission could be evaluated in respect of the value of relevant investments in a situation where long term control through vaccination was unclear but for myriad uncertainties.

Of course the impact on vulnerable groups as their status changed over the course of time ought to be identified explicitly, and discussed with representatives directly. And the ethics of necessary choices more broadly conceptualised in a way that makes them clear to the general public and other stakeholders. To an extent this was the direction of travel during the SARS 2 pandemic, starting with technical reports and preprints released haphazardly, moving to regular forecasts published weekly. But all of this developed without much clarity about what the overriding decision approach was, and it seems likely that any red team would have been more critical of government than science. That is not to say science was unimpeachable, but criticism was made and the challenge, which the standing of a red team ought to have brought focus on, was whether flaws in evidence were remedied expeditiously.

Evaluation of catastrophic event responses sounds like something which cannot be planned for, but of course clinical trials were run because they were planned (in the UK more than elsewhere). So although each crisis will be different, as for major projects the canard of uniqueness is a false prophet, and evaluation frameworks ought to be developed for other kinds of interventions. This development was called for and did not happen during the pandemic because it is much more involved than saying limits to supply can be distributed in a way that facilitates research. The much vaunted RECOVERY trial of acute treatments in intensive care was based on decades of methodological and computational research for the Bayesian adaptive platform, and the governance and other protocols saw substantial investment.

Public Statistic

Comments