Simulations investigating herd immunity with heterogeneous infectiousness

- There is no reason to think that the requirement for herd immunity is defined by \((R_{0} - 1) / R_0\).
- If you consider that some people tend to be more large groups of people, they will both be infected earlier, and reduce the spread of disease by a larger amount if they become immune.
- Thus the minimum threshold possible for herd immunity could plausibly be significantly lower than the claimed
*70-80%*. - The path to immunity matters. If infection spreads fast, it is possible can overshoot the herd immunity threshold level.
- Thus, the fact that some towns have
*60%+*infected, does not mean that herd immunity can not be reached at much lower levels, in places with lower momentum.

How does herd immunity work exactly? The basic theory is that if \(R_0 = 3\), then the herd immunity threshold is *66%*. This means that if immunuty in a population is above this threshold, the disease would not spread if reintroduced, while if immunity if below this threshold, it would. The argument behind this is that if \(R_0 = 3\), then every infectious person infects *3* other people. But if *2* or more of these are immune, then each infected person will infect at most one other.

However, I think this argument is based on an inaccurate model, and it is possible that the real herd immunity threshold is significantly lower. I will look at this with some simulations.

I do this in the simplest way with the fewest assumptions:

- A population of 1 million

- SIR model: Each individual is either susceptible, infectious, or immune.
- Each person has an infectivity factor
*iif* - In each iteration, each infected person spreads the virus to
*iif*other random people. Following this they become immune - The infection targets who are not immune become infected for the next iteration
- The risk of an individual to be infected is proportional to that persons
*iif*

I plot a number of random runs to see the variation

[Note: These simulations are clearly simplistic. However, I think simple models can show some simple points.]

```
library(pacman)
p_load(tidyverse, magrittr, glue, zeallot, ggeasy, pander)
panderOptions('round', 2)
set.seed(0)
```

First, let us look at the standard assumption, that everyone has the same *iif*

```
n <- 10**6
df <- tibble(id = 1:n, r = 3, infected = F, immune = F)
stat_table <- function(df)
{
tribble(~variable, ~value,
"n", df %>% nrow(),
"mean", mean(df$r),
"median", median(df$r),
"min", min(df$r),
"max", max(df$r),
"variance", var(df$r),
"infections by top 20% most infectious", df %>% arrange(-r) %>% head(0.2 *n) %$% sum(r) / sum(df$r)
)
}
stat_table(df)
```

variable | value |
---|---|

n | 1e+06 |

mean | 3 |

median | 3 |

min | 3 |

max | 3 |

variance | 0 |

infections by top 20% most infectious | 0.2 |

Here is the development in a population with *70%* immunity, in which new infections are introduced:

```
plot_it <- function(t){
t %>% rowid_to_column("i") %>%
pivot_longer(-i) %>%
mutate(value = value / n) %>%
ggplot(aes(x = i, y = value, fill = name)) +
geom_line(color = "turquoise4") +
theme_light() +
labs(y = "", x = "") +
scale_color_brewer() +
scale_y_continuous(labels = scales::percent, limits = c(0, 1))
}
run_one_iteration <- function(df){
#The sum of infectiousness of the infected
n_newinf <- round(df %>% filter(infected) %$% sum(r))
# These new infections hit people with a chance determined by their infectiousness
newinf <- sample(df$id, n_newinf, replace = T, prob = df$r)
# After one iteration, the people hit by an infection become infected if they are not immune,
# and the previously infected people become immune
df %>% mutate(
immune = immune | infected,
infected = id %in% newinf & !immune
)
}
run_n_iterations <- function(n, df, l){
for (i in 1:n){
df <- run_one_iteration(df)
l <- c(l, df %>% filter(infected | immune) %>% nrow())
}
list(df, l)
}
one_run_orig <- function(df, n_iter, n_newinf = 500){
new_inf <- sample(df %>% filter(!immune) %>% pull(id), n_newinf)
df %<>% mutate(infected = id %in% new_inf)
l <- df %>% filter(infected | immune) %>% nrow()
c(df, l) %<-% run_n_iterations(n_iter, df, l)
l
}
runit <- function(df){
n_iter <- 30
glue("r{1:2}") %>% map_dfc(~tibble(!!.x := one_run_orig(df, n_iter))) %>%
plot_it()
}
df %>% mutate(immune = id <= 0.7 * n) %>% runit()
```

We can see that the disease does not start spreading again, and we have herd immunity, as predicted by the theory.

Now let’s see if we have *50%* with immunity and infection is reintroduced:

```
df %>% mutate(immune = id < 0.5 * n) %>% runit()
```

Here the immunity level is not high enough, and infection is reintroduced, and continues until the immunity threshold level is reached. In fact, it overshoots. The theoretical immunity threshold is *66%*, but it reaches more than *75%* infected. This is due to infection momentum. If infection is widespread at one point in time, the number of infected people can exceed the threshold limit.

We can see this even more clearly if we look at the spread in a non-immune population:

```
df %>% runit()
```

Here we reach infection levels of close to *100%*! Much higher than the immunity threshold. This is an important mechanism to remember when people say we should aim for herd immunity. The path matters.

In the model above, we assume that every person spread the virus to *3* new people. But this assumption is clearly not correct. Some people meet a lot of other people, shake hands, go to crowded parties or public transport, etc. While other people are in the other end of the spectrum.

Crucially, if we consider this, the herd immunity threshold will change. This is important since the public debate assumes that *70%* or more is required for herd immunity. But these simulations will show that there is good reason to think that this may not be the case.

It is of course impossible to know what the real distribution of infectiousness looks like. I will use a gamma distribution with two conditions:

- It follows the Pareto rule: The
*20%*most infectious are responsible for*80%*of infections - There is a lower bound of
*iif*at \(1/10\) of the mean, so that we don’t have people who are practically never at risk of being infected.

```
mean_r <- 3
cv <- 2.39
r = qgamma(seq(0.5 / n, 1 - 0.5 / n, by = 1 / n),
sh = 1 / cv^2,
rate = (1/(mean_r * 0.9)) / cv^2) + mean_r / 10
df <- tibble(id = 1:n, r = r, infected = F, immune = F, r_orig = r)
stat_table(df)
```

variable | value |
---|---|

n | 1e+06 |

mean | 3 |

median | 0.49 |

min | 0.3 |

max | 167.1 |

variance | 41.64 |

infections by top 20% most infectious | 0.8 |

```
ggplot(df, aes(x = r)) +
geom_histogram(binwidth= 1, fill = "turquoise4", color = "black") +
theme_classic() +
labs(x = "Number of people infected by person x")
```

This looks reasonable to me. Most people infect few others, but with a tail of highly infectious people. The mean infectiousness is still 3 as before. In the following simulations, people infectiousness vary, together with their risk of being infected.

If this *iif* distribution is assumed, the disease will spread in a susceptible population like this:

```
one_run_base <- function(df, params){
start_inf <- sample(df %>% filter(!immune) %>% pull(id), params$n_startinf)
df %<>% mutate(infected = id %in% start_inf)
l <- df %>% filter(infected | immune) %>% nrow()
c(df, l) %<-% run_n_iterations(params$end, df, l)
l
}
params <- tibble(end = 30, n_startinf = 20)
a <- glue("r{1:10}") %>% map_dfc(~tibble(!!.x := one_run_base(df, params)))
a %>% plot_it()
```

Here we get to the herd immunity threshold at about *55%*, which is lower than the predicted *2/3*.

But note that this is with momentum. If the momentum is slowed, herd immunity could be achieved at even lower percentages, as shown in the plot below. The shaded area is a period where the R rate is reduced to below 1, and the dashed line is the point where *500* new infections are introduced into the population.

```
one_run <- function(df, params){
start_inf <- sample(df %>% filter(!immune) %>% pull(id), params$n_startinf)
df %<>% mutate(infected = id %in% start_inf)
l <- df %>% filter(infected | immune) %>% nrow()
c(df, l) %<-% run_n_iterations(params$start_lockdown, df, l)
df %<>% mutate(r = r_orig / 5)
c(df, l) %<-% run_n_iterations(params$end_lockdown - params$start_lockdown, df, l)
df %<>% mutate(r = r_orig)
c(df, l) %<-% run_n_iterations(params$reinfection - params$end_lockdown, df, l)
new_inf <- sample(df %>% filter(!immune) %>% pull(id), 500)
df %<>% mutate(infected = id %in% new_inf)
c(df, l) %<-% run_n_iterations(params$end - params$reinfection, df, l)
l
}
params <- tibble(start_lockdown = 4, end_lockdown = 8, reinfection = 20, end = 30, n_startinf = 20)
b <- glue("r{1:10}") %>% map_dfc(~tibble(!!.x := one_run(df, params)))# %>%
b %>% plot_it() +
annotate("rect", fill = "turquoise4", alpha = 0.1,
xmin = params$start_lockdown, xmax = params$end_lockdown,
ymin = -Inf, ymax = Inf) +
geom_vline(xintercept = params$reinfection, linetype = "dashed", color = "lightcoral")
```

It thus seems that there is no good reason to expect a herd immunity threshold at the level defined by \((X_0 -1) / X_0\), if a slightly more realistic model can have a much lower threshold.

Intuitively, the reason that a model with more realistic heterogeneity has lower herd immunity threshold, is that those who are the largest spreaders also on average tend to be infected earlier. And if a large portion of the people who have the largest *iif* are immune, that means herd immunity is easier to reach with fewer people.

Many say that the point of lockdown is to lower the pressure on hospitals only. But these models indicate the possibility that, if the lockdowns are used right, they could also lower the total number of infected significantly.

```
a %>% rowid_to_column("i") %>%
pivot_longer(-i) %>% mutate(momentum = "full") %>%
bind_rows(
b %>% rowid_to_column("i") %>%
pivot_longer(-i) %>%
mutate(momentum = "lowered")
) %>% mutate(value = value / n) %>%
ggplot(aes(x = i, y = value, fill = name, color = momentum)) +
geom_line() +
theme_light() +
labs(y = "", x = "") +
scale_y_continuous(labels = scales::percent, limits = c(0, 1))
```

Some people argue that the threshold cannot be less than *60%*, since some Italian cities have at least that level. However, we can see here that it depends on the momentum. The same model with the same parameters, can end up at 55% vs 30% depending on whether the momentum is lowered.

The timing of lockdowns (or other initiatives that reduce the mean R) is cricual. If it ends too early, it’s possible to overshoot the minimum required herd immunity threshold:

```
one_run <- function(df, params){
start_inf <- sample(df %>% filter(!immune) %>% pull(id), params$n_startinf)
df %<>% mutate(infected = id %in% start_inf)
l <- df %>% filter(infected | immune) %>% nrow()
c(df, l) %<-% run_n_iterations(params$start_lockdown, df, l)
df %<>% mutate(r = r_orig / 5)
c(df, l) %<-% run_n_iterations(params$end_lockdown - params$start_lockdown, df, l)
df %<>% mutate(r = r_orig)
c(df, l) %<-% run_n_iterations(params$end - params$end_lockdown, df, l)
l
}
params <- tibble(start_lockdown = 2, end_lockdown = 6, end = 30, n_startinf = 20)
a <- glue("r{1:10}") %>% map_dfc(~tibble(!!.x := one_run(df, params)))# %>%
a %>% plot_it() +
annotate("rect", fill = "turquoise4", alpha = 0.1,
xmin = params$start_lockdown, xmax = params$end_lockdown,
ymin = -Inf, ymax = Inf)
```

And the same is the case if the lockdown starts too late:

```
one_run <- function(df, params){
start_inf <- sample(df %>% filter(!immune) %>% pull(id), params$n_startinf)
df %<>% mutate(infected = id %in% start_inf)
l <- df %>% filter(infected | immune) %>% nrow()
c(df, l) %<-% run_n_iterations(params$start_lockdown, df, l)
df %<>% mutate(r = r_orig / 5)
c(df, l) %<-% run_n_iterations(params$end_lockdown - params$start_lockdown, df, l)
df %<>% mutate(r = r_orig)
c(df, l) %<-% run_n_iterations(params$end - params$end_lockdown, df, l)
l
}
params <- tibble(start_lockdown = 5, end_lockdown = 10, end = 30, n_startinf = 25)
a <- glue("r{1:10}") %>% map_dfc(~tibble(!!.x := one_run(df, params)))# %>%
a %>% plot_it() +
annotate("rect", fill = "turquoise4", alpha = 0.1,
xmin = params$start_lockdown, xmax = params$end_lockdown,
ymin = -Inf, ymax = Inf)
```

In the Swine flu pandemic, it was estimated that \(R_0 = 1.5\) and *24%* were eventually infected. How does this fit with the model here?

The claim is that this fits with the claim that herd immunity is around \((R_0 - 1) / R_0\). Since this is \((1.5 - 1) / 1.5\) = 33%, which is reasonably close to 24%. (And there may have been some lingering immunity left from the 1968 pandemic, which would further further lower the herd immunity threshold.)

But if we assume a model where everyone has the same *iif*, momentum would actually carry is far past 24%.

```
n <- 10**6
df_std <- tibble(id = 1:n, r = 1.5, infected = F, immune = F)
```

```
df_std %>% runit()
```

In contrast, the model with heterogeneous *iif* fits slightly better:

```
mean_r <- 1.5
cv <- 2.39
r = qgamma(seq(0.5 / n, 1 - 0.5 / n, by = 1 / n),
sh = 1 / cv^2,
rate = (1/(mean_r * 0.9)) / cv^2) + mean_r / 10
df_het <- tibble(id = 1:n, r = r, infected = F, immune = F, r_orig = r)
stat_table(df_het)
```

variable | value |
---|---|

n | 1e+06 |

mean | 1.5 |

median | 0.25 |

min | 0.15 |

max | 83.52 |

variance | 10.41 |

infections by top 20% most infectious | 0.8 |

```
df_het %>% runit()
```

Thus, while it is true that the level ended up at something close to that indicated by \((R_0 - 1) / R_0\), this is not because this estimate is correct. But rather a result of \((R_0 - 1) / R_0\) giving to high an evaluation for the necessary herd immunity threshold, but ignoring momentum giving too low an evaluation, which sort of even each other out.