13 Reasons Why

Did the Netflix show 13 Reasons Why cause a large jump in suicides? The evidence is shaky.

Jonatan Pallesen
05-05-2019

Introduction

In a recent study they find that the number of suicides increased following the release of the Netflix show 13 Reasons Why. This has gotten a lot of press; but such questions can be tricky statistically. It is easy to download the numbers from CDC Wonder, so I do that and take a look myself.

Analysis

read data


library(pacman)

p_load(tidyverse, janitor, readxl, rap, magrittr, lubridate, glue, scales, jcolors)

source('../../src/extra.R', echo = F, encoding="utf-8")



read_suicides <- function(path){
  read_tsv(path) %>% 
    clean_names %>% 
    filter(is.na(notes)) %>%
    separate(month_code, into = c("year", "month"), sep="/") %>%
    select(year, month, suicides = deaths, gender)  
}

normalize <- function(df){
  females <- df %>% filter(gender == "Female")
  y0 <- females %>% filter(year == 2016, month == "01") %>% pull(suicides)
  females %<>% mutate(suicides = suicides / y0)
  
  males <- df %>% filter(gender == "Male")
  y0 <- males %>% filter(year == 2016, month == "01") %>% pull(suicides)
  males %<>% mutate(suicides = suicides / y0)
  
  bind_rows(males, females)  
}

teens <- read_suicides("data/suicides_10_17.txt")

teens_norm <- teens %>% normalize()

twenties <- read_suicides("data/suicides_18_29.txt")

plot


month_names <- 
  c("01" = "Jan", "02" = "Feb", "03" = "Mar", "04" = "Apr",
    "05" = "May", "06" = "Jun", "07" = "Jul", "08" = "Aug",
    "09" = "Sep", "10" = "Oct", "11" = "Nov", "12" = "Dec",
    "Male" = "Male", "Female" = "Female")

plotit <- function(df, title){
  df %>% filter(year %in% c("2013", "2014", "2015", "2016", "2017")) %>% 
    ggplot(aes(x = year, y=suicides, fill=year)) + 
    geom_bar(stat="identity") +
    facet_grid(gender ~ month, labeller = as_labeller(month_names)) +
    theme(axis.text.x = element_blank(),
          axis.ticks.x = element_blank(),
          text = element_text(size = 19)) + 
    labs(x = "month", title = title) + 
    scale_fill_jcolors(palette="pal7")  
}

plotit(teens, "Age 10-17")

plot


plotit(twenties, "Age 18-29")


Suicide rates have been going up since the mid 2000s for all groups:

code


plot_suicides <- function(df){
  df %>% ggplot(aes(x = year, y = suicides, color = gender)) +
  geom_point(alpha = 0.4) + 
  theme(axis.text.x = element_text(angle = 45, hjust=1))   
}

teens %>% plot_suicides() + labs(title = "Age 10-17")

code


twenties %>% plot_suicides()+ labs(title = "Age 18-29")

Discussion

On the pro side of the hypothesis, April 2017 has the highest rate of suicides of any month.

On the con side we have the following arguments:

In the paper they make some sort of advanced model, but in the end whether you believe the result boils down to whether you think the April 2017 value is large enough to overcome the above problems.

My personal take is that these numbers Bayesianly marginally increase the probability I would put on the show causing an increased number of suicides, but not by too much.