The version of the data that we will use in this class can be found here.

Context

There has recently been a lot of media coverage about a “crisis in science” related to results of scientific studies that can’t be reproduced or studies that make headlines, only to later be retracted for a variety of reasons. FiveThirtyEight is a website originally founded by Nate Silver, a statistician who came to fame first through his work in baseball analytics and later as a political analyst and blogger, which focuses on analytic issues in politics, economics, and sports. In response to this so called crisis, FiveThirtyEight wrote a rebuttal piece pointing out that science isn’t broken, but is actually just very hard to get right. Part one of the three part series was about the problem of p-hacking, which occurs when researchers, knowingly or not, play around with the variables included and the form of the data until they find a significant association that supports their beliefs. The post includes an interactive tool where the reader can select the political party of interest and then can make a series of choices about both types and forms of variables to consider in the search for a significant association between political party and economic success.

The data posted here underlies this interactive graphic, and was obtained from the post’s authors.

Data description

There are 6 datasets in csv format.

The file “cpi” contains 822 observations of 2 variables related to the consumer price index, which is used as a measure of inflation:

  • DATE: date of the observation
  • VALUE: the value of the consumer price index on the associated date

The file “GDP” contains 273 observations of 2 variables related to the gross domestic product (GDP), a measure of economic production:

  • DATE: date of the observation
  • VALUE: the GDP on the associated date

The file “pols-month” contains 822 observations of 9 variables related to the number of national politicians who are democratic or republican at any given time:

  • mon: date of the count
  • prez_gop: indicator of whether the president was republican on the associated date (1 = yes, 0 = no)
  • gov_gop: the number of republican governors on the associated date
  • sen_gop: the number of republican senators on the associated date
  • rep_gop: the number of republican representatives on the associated date
  • prez_dem: indicator of whether the president was democratic on the associated date (1 = yes, 0 = no)
  • gov_dem: the number of democratic governors on the associated date
  • sen_dem: the number of democratic senators on the associated date
  • rep_dem: the number of democratic representatives on the associated date

The file “recessions” contains 11 observations of 2 variables, representing the dates of 11 individual recessions. Each row of the dataset has a date for the start of a recession and a date for the end of the recession:

  • start: start date of a recession
  • end: end date of a recession

The file “snp” contains 787 observations of 2 variables related to Standard & Poor’s stock market index (S&P), often used as a representative measure of stock market as a whole:

  • date: the date of the observation
  • close: the closing values of the S&P stock index on the associated date

The file “unemployment” contains 68 observations of 13 variables:

  • Year: the year of the measurements on that row
  • Jan: percentage of unemployment in January of the associated year
  • Feb: percentage of unemployment in February of the associated year
  • Mar: percentage of unemployment in March of the associated year
  • Apr: percentage of unemployment in April of the associated year
  • May: percentage of unemployment in May of the associated year
  • Jun: percentage of unemployment in June of the associated year
  • Jul: percentage of unemployment in July of the associated year
  • Aug: percentage of unemployment in August of the associated year
  • Sep: percentage of unemployment in September of the associated year
  • Oct: percentage of unemployment in October of the associated year
  • Nov: percentage of unemployment in November of the associated year
  • Dec: percentage of unemployment in December of the associated year