Analysis of regression to the mean in children of exceptional parents

What is the expectation of the distribution of a trait of offspring of parents with a trait value above the mean? We expect the child also to have higher than average, but less so than the parents, due to regression to the mean. I will look at a generic trait, with heritability of 0.6.

If the heritability is 0.6 that means that 60% of the variance is determined by heritable factors, that is, additive genetic variants. The remaining 40% of variance is explained by other factors that are not inherited. These elements are assumed to be uncorrelated and normally distributed.

The trait value is thus determined by:

\[\begin{equation} trait\_value = \sqrt{h^2} * additive + \sqrt{1 - h^2} * \mathcal{N}(\mu = 0, \sigma = 1) \end{equation}\]

The children inherit the additive portion from their two parents, while the other factors are again uncorrelated and randomly distributed with mean 0.

In this document, I use these values and formulas to simulate a population, and use it to investigate regression to the mean effects.

\[\begin{equation} child\_additive = \mathcal{N}(\mu = \frac{\textit{parent1_additive} + \textit{parent2_additive}}{2}, \sigma = \frac{1}{2}) \end{equation}\]

This maintains variance at a steady level with no assortative mating, but gives a slight variance inflation if there is assortative mating.```
library(pacman)
p_load(tidyverse, pander)
source('../../src/extra.R', echo = F, encoding="utf-8")
n <- 10**6
set.seed(10)
h <- 0.7
assortative_mating <- 0.4
gen_parent <- function(n, h, race_factor, prefix){
tibble(
additive = rnorm(n),
shared_family = rnorm(n),
other = rnorm(n),
mate_factor = rnorm(n),
trait = sqrt(h) * additive + sqrt(1 - h) * other + race_factor,
mate_order = mate_factor * sqrt(1 - assortative_mating) + trait * sqrt(assortative_mating)
) %>%
select_all(~glue("{prefix}_{.}"))
}
add_child <- function(df, n, race_factor, prefix, race = "white"){
df %>%
mutate(
additive = rnorm(n, mean = (parent1_additive + parent2_additive) / 2, sd = sqrt(1/2)),
other = rnorm(n),
trait = sqrt(h) * additive + sqrt(1 - h) * other + race_factor,
race = race
) %>% rename_at(vars(-contains("_")), ~glue("{prefix}_{.}"))
}
df <-
bind_cols(
gen_parent(n, h, 0, "parent1"),
gen_parent(n, h, 0, "parent2")
) %>%
add_child(n, 0, "child1") %>% add_child(n, 0, "child2")
dff <-
bind_cols(
gen_parent(n, h, 0, "parent1") %>% arrange(parent1_mate_order),
gen_parent(n, h, 0, "parent2") %>% arrange(parent2_mate_order)
) %>%
add_child(n, 0, "child1") %>% add_child(n, 0, "child2")
```

```
PanderOpts(round = 2)
tribble(~variable, ~value,
"parent trait mean", mean(df$parent1_trait),
"parent trait sd", sd(df$parent1_trait),
"parent additive genetic correlation", cor(df$parent1_additive, df$parent2_additive),
"sibling trait correlation", cor(df$child1_trait, df$child2_trait),
"sibling trait additive genetic correlation", cor(df$child1_additive, df$child2_additive),
"parent child trait correlation", cor(df$parent1_trait, df$child2_trait),
"parent child additive genetic correlation", cor(df$parent1_additive, df$child2_additive),
"child trait mean", mean(df$child1_trait),
"child trait sd", sd(df$child1_trait)
)
```

variable | value |
---|---|

parent trait mean | 0 |

parent trait sd | 1 |

parent additive genetic correlation | 0 |

sibling trait correlation | 0.35 |

sibling trait additive genetic correlation | 0.5 |

parent child trait correlation | 0.35 |

parent child additive genetic correlation | 0.5 |

child trait mean | 0 |

child trait sd | 1 |

Parent 1 is +2 standard deviations (SD) above the mean and parent 2 is at the mean. The midparent value is plotted with a dashed line. The X-axis is measured in standard deviations throughout.

```
close <- function(n, m) {(n >= m -0.2) & n <= m + 0.2}
plot_ridges <- function(df, midval, ylow = -2, yhigh = 3, differ_on = "child1_race"){
plot_ridges_q(df, "child1_trait", differ_on) +
scale_x_continuous(breaks = seq(ylow, yhigh, 1), limits = c(ylow, yhigh)) +
geom_vline(xintercept = midval, linetype = "dashed") +
labs(x = "Trait value", y = "")
}
one_ridge <- function(p){
p + labs(y = "") +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
scale_y_discrete(expand = c(0.2, 0))
}
df %>%
filter(close(parent1_trait, 2), close(parent2_trait, 0)) %>%
plot_ridges(1) %>% one_ridge()
```

Both parents are 2SDs above the mean

```
df %>%
filter(close(parent1_trait, 2), close(parent2_trait, 2)) %>%
plot_ridges(2) %>% one_ridge()
```

Trait values also regress upwards towards the mean, if the parents have low values.

```
df %>%
filter(close(parent1_trait, -2), close(parent2_trait, -1)) %>%
plot_ridges(-1.5, -4, +2) %>% one_ridge()
```

If the trait value of one sibling is high, the expected trait distribution of that childs siblings is significantly lower.

```
df %>%
filter(close(child2_trait, 2)) %>%
plot_ridges(2) %>% one_ridge() + labs(x = "Sibling trait value")
```

Regression to the mean only happens in the next generation, it does not go on forever. For instance, let’s assume that we select a group of people with trait value at least +2SD to settle another planet. We will call these the first generation. The second generation will then have lower average IQ due to regression to the mean. I then also simulate a third generation based on mating within the second generation. And we see that they have the same average trait value as the second generation.

```
dfh <- df %>% filter(parent1_trait > 2, parent2_trait > 2)
children <- bind_rows(
dfh %>% select(additive = child1_additive, trait_value = child1_trait, first_gen = parent1_trait),
dfh %>% select(additive = child2_additive, trait_value = child2_trait, first_gen = parent2_trait)
)
nn <- nrow(children) / 2
parent1 <- children %>%
head(nn) %>%
select(parent1_additive = additive, second_gen = trait_value, first_gen)
parent2 <- children %>%
tail(nn) %>%
select(parent2_additive = additive)
ylow <- -2; yhigh <- 4
bind_cols(parent1, parent2) %>%
add_child(nn, 0, "child") %>%
rename(third_gen = child_trait) %>%
select(first_gen, second_gen, third_gen) %>%
gather(key = "generation", value = "trait_value") %>%
plot_ridges_q("trait_value", "generation") +
scale_x_continuous(breaks = seq(ylow, yhigh, 1), limits = c(ylow, yhigh))
```

If the mean trait value of a population is lower, the regression to the mean will be larger. For example, there is frequently found a ~1SD gap in IQ scores between blacks and whites in US. Thus, if we compare the children of two black parents with high IQ (130) with the children of two white parents with equally high IQ, the distribution of the expected values of the black children will be a little lower.

```
set.seed(1)
f <- 10
whites <- bind_cols(gen_parent(n, h, 0, "parent1"), gen_parent(n, h, 0, "parent2")) %>%
add_child(n, 0, "child1", "white")
blacks <- bind_cols(gen_parent(n*f, h, -1, "parent1"), gen_parent(n*f, h, -1, "parent2")) %>%
add_child(n*f, -1, "child1", "black")
dfc <- bind_rows(whites, blacks)
dfc %>%
filter(close(parent1_trait, 1), close(parent2_trait, 1)) %>%
plot_ridges(2)
```

Height has a higher heritability than the 0.6 used above. This turns out only to have a minor effect on the RTM.

```
bind_rows(
bind_cols(gen_parent(n, 0.8, 0, "parent1"), gen_parent(n, 0.8, 0, "parent2")) %>%
add_child(n, 0, "child1") %>%
mutate(h2 = "0.8"),
df %>% mutate(h2 = "0.6")
) %>%
filter(close(parent1_trait, 2), close(parent2_trait, 1)) %>%
plot_ridges(2, differ_on = "h2")
```

People tend to marry partners that are more similar to themselves. I look here at an assortative mating strength of 0.4. This turns out not to affect the RTM results.

```
bind_rows(
dff %>% mutate(mating = "assortative_mating"),
df %>% mutate(mating = "random_mating")
) %>%
filter(close(parent1_trait, 1), close(parent2_trait, 1)) %>%
plot_ridges(2, differ_on = "mating")
```

Regression to the mean and egression from the mean. Children of parents that have high values tend to move closer to the mean. This is illustrated with red color in the figure below. Of course, variation in the population has to be maintained, and that happens when parents that are close to normal have children that are further from the mean, illustrated in blue in the figure below.

```
df %>%
sample_frac(0.1) %>%
mutate(
midparent = (parent1_trait + parent2_trait) /2,
effect = case_when(
abs(midparent - 1) > abs(child1_trait - 1) + 0.2 ~ "regression to the mean",
abs(midparent - 1) + 0.2 < abs(child1_trait - 1) ~ "egression from the mean",
T ~ "keeps around same value")
) %>%
ggplot(aes(y = child1_trait, x = midparent, color = effect)) +
geom_point(alpha = 0.1) +
ylim(-4, 4) +
xlim(-4, 4) +
guides(colour = guide_legend(override.aes = list(size=4, alpha = 1))) +
scale_color_manual(values=c("steelblue", "#D3D3D3", "brown4"))
```