Regression to the mean

Analysis of regression to the mean in children of exceptional parents

Jonatan Pallesen
10-10-2018

Table of Contents


Introduction

What is the expectation of the distribution of a trait of offspring of parents with a trait value above the mean? We expect the child also to have higher than average, but less so than the parents, due to regression to the mean. I will look at a generic trait, with heritability of 0.6.

If the heritability is 0.6 that means that 60% of the variance is determined by heritable factors, that is, additive genetic variants. The remaining 40% of variance is explained by other factors that are not inherited. These elements are assumed to be uncorrelated and normally distributed.

The trait value is thus determined by:

\[\begin{equation} trait\_value = \sqrt{h^2} * additive + \sqrt{1 - h^2} * \mathcal{N}(\mu = 0, \sigma = 1) \end{equation}\]

The children inherit the additive portion from their two parents, while the other factors are again uncorrelated and randomly distributed with mean 0.

In this document, I use these values and formulas to simulate a population, and use it to investigate regression to the mean effects.

Details The additive genetic values of offspring can’t simply be half of the parents values, since this would deplete variation. According to this paper, “Bossert (unpublished) pioneered a phenotypic version of this model, in which inheritance is represented by a fixed ‘‘segregation kernel’’ that approximates the distribution of full-sib phenotypes as a Gaussian with mean equal to the midparent value and fixed variance, independent of the midparent.” I use this same idea and sample the children’s additive values from a Gaussian distribution with variance = 1/2:

\[\begin{equation} child\_additive = \mathcal{N}(\mu = \frac{\textit{parent1_additive} + \textit{parent2_additive}}{2}, \sigma = \frac{1}{2}) \end{equation}\]

This maintains variance at a steady level with no assortative mating, but gives a slight variance inflation if there is assortative mating.

Simulations


library(pacman)

p_load(tidyverse, pander)

source('../../src/extra.R', echo = F, encoding="utf-8")

n <- 10**6

set.seed(10)



h <- 0.7

assortative_mating <- 0.4



gen_parent <- function(n, h, race_factor, prefix){
  tibble(
    additive = rnorm(n),
    shared_family = rnorm(n),
    other = rnorm(n),
    mate_factor = rnorm(n),
    trait = sqrt(h) * additive + sqrt(1 - h) * other + race_factor,
    mate_order = mate_factor * sqrt(1 - assortative_mating) + trait * sqrt(assortative_mating)
  ) %>% 
    select_all(~glue("{prefix}_{.}"))
}

add_child <- function(df, n, race_factor, prefix, race = "white"){
  df %>% 
    mutate(
      additive = rnorm(n, mean = (parent1_additive + parent2_additive) / 2, sd = sqrt(1/2)),
      other = rnorm(n),
      trait = sqrt(h) * additive + sqrt(1 - h) * other + race_factor,
      race = race
    ) %>% rename_at(vars(-contains("_")), ~glue("{prefix}_{.}"))
}

df <- 
  bind_cols(
    gen_parent(n, h, 0, "parent1"), 
    gen_parent(n, h, 0, "parent2")
  ) %>% 
  add_child(n, 0, "child1") %>% add_child(n, 0, "child2")

dff <- 
  bind_cols(
    gen_parent(n, h, 0, "parent1") %>% arrange(parent1_mate_order), 
    gen_parent(n, h, 0, "parent2") %>% arrange(parent2_mate_order)
  ) %>% 
  add_child(n, 0, "child1") %>% add_child(n, 0, "child2")

show simulated values


PanderOpts(round = 2)

tribble(~variable, ~value,
  "parent trait mean", mean(df$parent1_trait),
  "parent trait sd", sd(df$parent1_trait),
  "parent additive genetic correlation", cor(df$parent1_additive, df$parent2_additive),
  "sibling trait correlation", cor(df$child1_trait, df$child2_trait),
  "sibling trait additive genetic correlation", cor(df$child1_additive, df$child2_additive),
  "parent child trait correlation", cor(df$parent1_trait, df$child2_trait),
  "parent child additive genetic correlation", cor(df$parent1_additive, df$child2_additive),
  "child trait mean", mean(df$child1_trait),
  "child trait sd", sd(df$child1_trait)
)
variable value
parent trait mean 0
parent trait sd 1
parent additive genetic correlation 0
sibling trait correlation 0.35
sibling trait additive genetic correlation 0.5
parent child trait correlation 0.35
parent child additive genetic correlation 0.5
child trait mean 0
child trait sd 1

Illustrations

Regression to the mean

Parent 1 is +2 standard deviations (SD) above the mean and parent 2 is at the mean. The midparent value is plotted with a dashed line. The X-axis is measured in standard deviations throughout.


close <- function(n, m) {(n >= m -0.2) & n <= m + 0.2}

plot_ridges <- function(df, midval, ylow = -2, yhigh = 3, differ_on = "child1_race"){
  plot_ridges_q(df, "child1_trait", differ_on) +
  scale_x_continuous(breaks = seq(ylow, yhigh, 1), limits = c(ylow, yhigh)) +
  geom_vline(xintercept = midval, linetype = "dashed") +
  labs(x = "Trait value", y = "")
}

one_ridge <- function(p){
  p + labs(y = "") + 
    theme(
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank()) +
    scale_y_discrete(expand = c(0.2, 0))
}

df %>%
  filter(close(parent1_trait, 2), close(parent2_trait, 0)) %>% 
  plot_ridges(1) %>% one_ridge()


Slightly larger RTM

Both parents are 2SDs above the mean


df %>%
  filter(close(parent1_trait, 2), close(parent2_trait, 2)) %>% 
  plot_ridges(2) %>% one_ridge()


Upwards RTM

Trait values also regress upwards towards the mean, if the parents have low values.


df %>%
  filter(close(parent1_trait, -2), close(parent2_trait, -1)) %>% 
  plot_ridges(-1.5, -4, +2) %>% one_ridge()


Sibling RTM

If the trait value of one sibling is high, the expected trait distribution of that childs siblings is significantly lower.


df %>%
  filter(close(child2_trait, 2)) %>% 
  plot_ridges(2) %>% one_ridge() + labs(x = "Sibling trait value")


RTM only happens in one generation

Regression to the mean only happens in the next generation, it does not go on forever. For instance, let’s assume that we select a group of people with trait value at least +2SD to settle another planet. We will call these the first generation. The second generation will then have lower average IQ due to regression to the mean. I then also simulate a third generation based on mating within the second generation. And we see that they have the same average trait value as the second generation.


dfh <- df %>% filter(parent1_trait > 2, parent2_trait > 2)

children <- bind_rows(
  dfh %>% select(additive = child1_additive, trait_value = child1_trait, first_gen = parent1_trait),
  dfh %>% select(additive = child2_additive, trait_value = child2_trait, first_gen = parent2_trait)
)

nn <- nrow(children) / 2

parent1 <- children %>% 
  head(nn) %>% 
  select(parent1_additive = additive, second_gen = trait_value, first_gen)

parent2 <- children %>% 
  tail(nn) %>% 
  select(parent2_additive = additive)

ylow <- -2; yhigh <- 4

bind_cols(parent1, parent2) %>% 
  add_child(nn, 0, "child") %>% 
  rename(third_gen = child_trait) %>% 
  select(first_gen, second_gen, third_gen) %>% 
  gather(key = "generation", value = "trait_value") %>% 
  plot_ridges_q("trait_value", "generation") +
  scale_x_continuous(breaks = seq(ylow, yhigh, 1), limits = c(ylow, yhigh))


RTM in populations with different means

If the mean trait value of a population is lower, the regression to the mean will be larger. For example, there is frequently found a ~1SD gap in IQ scores between blacks and whites in US. Thus, if we compare the children of two black parents with high IQ (130) with the children of two white parents with equally high IQ, the distribution of the expected values of the black children will be a little lower.


set.seed(1)

f <- 10

whites <- bind_cols(gen_parent(n, h, 0, "parent1"), gen_parent(n, h, 0, "parent2")) %>% 
  add_child(n, 0, "child1", "white")

blacks <- bind_cols(gen_parent(n*f, h, -1, "parent1"), gen_parent(n*f, h, -1, "parent2")) %>% 
  add_child(n*f, -1, "child1", "black")

dfc <- bind_rows(whites, blacks)

dfc %>%
  filter(close(parent1_trait, 1), close(parent2_trait, 1)) %>% 
  plot_ridges(2)


RTM with different heritabilities

Height has a higher heritability than the 0.6 used above. This turns out only to have a minor effect on the RTM.


bind_rows(
  bind_cols(gen_parent(n, 0.8, 0, "parent1"), gen_parent(n, 0.8, 0, "parent2")) %>% 
  add_child(n, 0, "child1") %>% 
  mutate(h2 = "0.8"),
  df %>% mutate(h2 = "0.6")
  ) %>% 
  filter(close(parent1_trait, 2), close(parent2_trait, 1)) %>% 
  plot_ridges(2, differ_on = "h2")


RTM with assortative mating

People tend to marry partners that are more similar to themselves. I look here at an assortative mating strength of 0.4. This turns out not to affect the RTM results.


bind_rows(
  dff %>% mutate(mating = "assortative_mating"),
  df %>% mutate(mating = "random_mating")
)  %>% 
  filter(close(parent1_trait, 1), close(parent2_trait, 1)) %>% 
  plot_ridges(2, differ_on = "mating")

Egression from the mean

Regression to the mean and egression from the mean. Children of parents that have high values tend to move closer to the mean. This is illustrated with red color in the figure below. Of course, variation in the population has to be maintained, and that happens when parents that are close to normal have children that are further from the mean, illustrated in blue in the figure below.


df %>% 
  sample_frac(0.1) %>% 
  mutate(
    midparent = (parent1_trait + parent2_trait) /2,
    effect = case_when(
      abs(midparent - 1) > abs(child1_trait - 1) + 0.2 ~ "regression to the mean", 
      abs(midparent - 1) + 0.2 < abs(child1_trait - 1) ~ "egression from the mean",
      T ~ "keeps around same value")
         ) %>% 
  ggplot(aes(y = child1_trait, x = midparent, color = effect)) +
  geom_point(alpha = 0.1) +
  ylim(-4, 4) +
  xlim(-4, 4) +
  guides(colour = guide_legend(override.aes = list(size=4, alpha = 1))) +
  scale_color_manual(values=c("steelblue", "#D3D3D3", "brown4"))