Background
Disclaimer
The following coursebook is produced by the team at Algoritma for its Data Science Academy internal training. The coursebook is intended for a restricted audience only, i.e. the individuals and organizations having received this coursebook directly from the training organization. It may not be reproduced, distributed, translated or adapted in any form outside these individuals and organizations without permission.
Libraries and Setup
You will need install the package to do APC analysis if it’s not already downloaded onto your machine.
library(broom)
library(tidyverse)
Training Objectives
The primary objective of this course is to provide a comprehensive introduction to the science of APC analysis. The syllabus covers:
- APC Analysis
- Workflow
- APC in R
APC Analysis
Age-period-cohort (APC) analysis is widely used for capturing the characteristics in time-varying elements, and mostly used in epidemiology and demographic fields or we could say this is identification problem. In particular, APC analysis discerns three types of time varying phenomena1:
-
Age effects:
Age effect is variations linked to biological and social processes of aging specific to individuals. They include physiologic changes and accumulation of social experiences linked to aging, but unrelated to the time period or birth cohort to which an individual belongs.
-
Period effects:
Period effect is the result from external factors that equally affect all age groups at a particular calendar time. It could arise from a range of environmental, social and economic factors e.g. war, famine, economic crisis. Methodological changes in outcome definitions, classifications, or method of data collection could also lead to period effects in data.
-
Cohort effects:
Cohort effect is variations resulting from the unique experience/exposure of a group of subjects (cohort) as they move across time. In epidemiology, a cohort effect is conceptualized as an interaction or effect modification due to a period effect that is differentially experienced through age-specific exposure or susceptibility to that event or cause.
In contrast to this conceptualization of cohort effect as an effect modification in epidemiology, sociological literature consider cohort effect as a structural factor representing the sum of all unique exposures experienced by the cohort from birth. In this case, age and period effect are conceived as confounders of cohort effect and APC analysis aims to disentangle the independent effect of age, period and cohort. Most of the APC analysis strategies are based on the sociological model of cohort effect, conceptualize independent effect of age, period and cohort effect.
Identification problem in APC: APC analysis aims at describing and estimating the independent effect of age, period and cohort on the outcome under study. The different strategies used aims to partition variance into the unique components attributable to age, period, and cohort effects. However, there is a major impediment to independently estimating age, period, and cohort effects by modeling the data which is know as the “identification problem” in APC. This is due to the exact linear dependency among age, period, and cohort: \(Period – Age = Cohort\)
; that is, given the calendar year and age, one can determine the cohort (birth year). The presence of perfectly collinear predictors (age, period and cohort) in a regression model will produce a singular non-identifiable design matrix, from which it is statistically impossible to estimate unique estimates for the three effects.
Age-Period-Cohort will be exist together at once, we saperate them to see which more important driving the societal changes. But those sometimes could be dificult because three of them were collinear.
Conventional solutions to APC
Turn APC with regression in order to estimate parameters (trends and deviations). The parameters are combined to produce functions that describe relationships between the observed number of suicides and age, calendar period, and birth cohort.
A popular approach to resolving the identification problem was by using constraint based regression analysis (Constrained Coefficients GLIM estimator (CGLIM)). In this strategy additional constrains are placed on one of the categories of at least one predictor to simultaneously estimate the age period and cohort effect. Thus assuming some categories of age groups, cohorts or time periods have identical effects on the dependent variable it becomes possible to estimate independent effect of age period and cohort2
However, the results from this analysis will depend on constrains chosen by the investigator based on external information. The validity of the constraints chosen will depend on the theoretical preconception about the categories of parameter that are identical, is often subjective and there is no empirical way to confirm the validity of the chosen constraints 3.
APC with lm
I’ve prepared the following data originally made available in kaggle. We’ll read the data into our workspace:
# load data
suicides <- read_csv("data_input/suicides.csv")
# quick check
glimpse(suicides)
#> Observations: 27,820
#> Variables: 6
#> $ suicides <dbl> 21, 16, 14, 1, 9, 1, 6, 4, 1, 0, 0, 0, 2, 17, 1, 14, 4, 8,...
#> $ age <dbl> 20, 44, 20, 75, 30, 75, 44, 30, 64, 10, 64, 10, 75, 20, 75...
#> $ period <dbl> 1987, 1987, 1987, 1987, 1987, 1987, 1987, 1987, 1987, 1987...
#> $ cohort <dbl> 1967, 1943, 1967, 1912, 1957, 1912, 1943, 1957, 1923, 1977...
#> $ sex <chr> "male", "male", "female", "male", "male", "female", "femal...
#> $ country <chr> "Albania", "Albania", "Albania", "Albania", "Albania", "Al...
The data have 27.820 observation and 5 variables, here some description to get insight of the data:
sucides
: number of suicidesage
: self explanatoryperiod
: current year when the data is collectedcohort
: birth year of each age group during thatperiod
sex
: gender indicatorcountry
: country indicator
We need to readjust APC variable to start with:
# readjust apc variable
suicides <- suicides %>%
mutate(age_squared = age ^ 2) %>%
mutate_at(vars(period, cohort), funs(as.factor(.))) %>%
select(suicides, age, age_squared, period, cohort, everything())
suicides
#> # A tibble: 27,820 x 7
#> suicides age age_squared period cohort sex country
#> <dbl> <dbl> <dbl> <fct> <fct> <chr> <chr>
#> 1 21 20 400 1987 1967 male Albania
#> 2 16 44 1936 1987 1943 male Albania
#> 3 14 20 400 1987 1967 female Albania
#> 4 1 75 5625 1987 1912 male Albania
#> 5 9 30 900 1987 1957 male Albania
#> 6 1 75 5625 1987 1912 female Albania
#> 7 6 44 1936 1987 1943 female Albania
#> 8 4 30 900 1987 1957 female Albania
#> 9 1 64 4096 1987 1923 male Albania
#> 10 0 10 100 1987 1977 female Albania
#> # ... with 27,810 more rows
Now we have our data that contained APC variables. For the next step we applied linear regression to extract coefficient of APC with equation:
$$y_{ij} = \alpha_{ij} + \beta_{ij}Age + \beta_{ij}Age^2 + e_{ij}$$
where \(i\)
is representing period, and \(j\)
representing cohort.
apc_lm <- lm(suicides ~ age + age_squared + period + cohort, suicides)
summary(apc_lm)
#>
#> Call:
#> lm(formula = suicides ~ age + age_squared + period + cohort,
#> data = suicides)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -579.9 -278.9 -145.3 0.8 21762.0
#>
#> Coefficients: (1 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -339.22638 86.84338 -3.906 9.4e-05 ***
#> age 30.40751 2.32352 13.087 < 2e-16 ***
#> age_squared -0.32299 0.02171 -14.878 < 2e-16 ***
#> period1986 -15.26465 76.16068 -0.200 0.8411
#> period1987 -38.53047 78.43160 -0.491 0.6232
#> period1988 -6.43966 77.80014 -0.083 0.9340
#> period1989 66.94468 73.26358 0.914 0.3609
#> period1990 60.13425 74.21306 0.810 0.4178
#> period1991 67.48220 71.88165 0.939 0.3478
#> period1992 117.89746 75.17127 1.568 0.1168
#> period1993 147.71811 76.42130 1.933 0.0533 .
#> period1994 138.08179 73.26564 1.885 0.0595 .
#> period1995 66.26804 58.33994 1.136 0.2560
#> period1996 31.27579 65.47891 0.478 0.6329
#> period1997 4.76593 74.90887 0.064 0.9493
#> period1998 19.28488 73.47197 0.262 0.7930
#> period1999 61.33906 63.39151 0.968 0.3332
#> period2000 44.19804 69.85120 0.633 0.5269
#> period2001 36.71037 69.35817 0.529 0.5966
#> period2002 89.40211 72.32410 1.236 0.2164
#> period2003 114.10428 73.45058 1.553 0.1203
#> period2004 103.85482 71.23562 1.458 0.1449
#> period2005 53.98893 57.77302 0.935 0.3501
#> period2006 12.47963 70.09375 0.178 0.8587
#> period2007 -11.43421 76.46974 -0.150 0.8811
#> period2008 0.25884 77.20994 0.003 0.9973
#> period2009 6.49784 67.83942 0.096 0.9237
#> period2010 -0.99290 72.96721 -0.014 0.9891
#> period2011 -1.10585 74.49438 -0.015 0.9882
#> period2012 37.47225 76.92286 0.487 0.6262
#> period2013 60.65137 77.25632 0.785 0.4324
#> period2014 64.04916 76.78528 0.834 0.4042
#> period2015 81.00665 61.20244 1.324 0.1857
#> period2016 -176.96659 95.17267 -1.859 0.0630 .
#> cohort1911 22.61882 148.70351 0.152 0.8791
#> cohort1912 44.08371 146.19982 0.302 0.7630
#> cohort1913 26.93923 147.81696 0.182 0.8554
#> cohort1914 -24.12898 143.26606 -0.168 0.8663
#> cohort1915 -19.81134 138.07605 -0.143 0.8859
#> cohort1916 -28.98740 136.20152 -0.213 0.8315
#> cohort1917 -82.84906 136.98890 -0.605 0.5453
#> cohort1918 -114.37741 137.04714 -0.835 0.4040
#> cohort1919 -111.87468 133.66927 -0.837 0.4026
#> cohort1920 -56.80810 122.40780 -0.464 0.6426
#> cohort1921 -7.46818 105.86601 -0.071 0.9438
#> cohort1922 19.71152 117.59522 0.168 0.8669
#> cohort1923 18.66899 116.22700 0.161 0.8724
#> cohort1924 -22.24166 111.82124 -0.199 0.8423
#> cohort1925 -13.71874 112.07531 -0.122 0.9026
#> cohort1926 -13.01545 110.40762 -0.118 0.9062
#> cohort1927 -36.82096 109.98751 -0.335 0.7378
#> cohort1928 -62.24102 110.61311 -0.563 0.5736
#> cohort1929 -59.50693 109.51221 -0.543 0.5869
#> cohort1930 -27.45295 103.07426 -0.266 0.7900
#> cohort1931 12.35088 101.39101 0.122 0.9030
#> cohort1932 47.60027 105.34566 0.452 0.6514
#> cohort1933 51.11593 107.56988 0.475 0.6347
#> cohort1934 41.37847 102.76022 0.403 0.6872
#> cohort1935 21.78125 101.07994 0.215 0.8294
#> cohort1936 21.87097 102.64412 0.213 0.8313
#> cohort1937 2.80057 102.45819 0.027 0.9782
#> cohort1938 -28.18272 103.14675 -0.273 0.7847
#> cohort1939 -36.82257 102.73980 -0.358 0.7200
#> cohort1940 -35.64866 97.49467 -0.366 0.7146
#> cohort1941 10.99553 90.49327 0.122 0.9033
#> cohort1942 35.36708 105.19654 0.336 0.7367
#> cohort1943 51.70791 107.13860 0.483 0.6294
#> cohort1944 45.48532 107.37955 0.424 0.6719
#> cohort1945 56.59088 101.43700 0.558 0.5769
#> cohort1946 64.81355 102.18740 0.634 0.5259
#> cohort1947 74.93599 101.21464 0.740 0.4591
#> cohort1948 60.15492 103.81245 0.579 0.5623
#> cohort1949 53.42971 103.83011 0.515 0.6068
#> cohort1950 64.52937 101.63502 0.635 0.5255
#> cohort1951 105.37591 90.35037 1.166 0.2435
#> cohort1952 162.65568 109.16427 1.490 0.1362
#> cohort1953 193.34415 119.11312 1.623 0.1046
#> cohort1954 187.23643 117.66962 1.591 0.1116
#> cohort1955 67.65765 93.95205 0.720 0.4715
#> cohort1956 82.22594 101.94472 0.807 0.4199
#> cohort1957 73.24009 100.42288 0.729 0.4658
#> cohort1958 49.52042 102.82867 0.482 0.6301
#> cohort1959 28.79550 101.93464 0.282 0.7776
#> cohort1960 12.82476 98.64709 0.130 0.8966
#> cohort1961 25.46981 92.78207 0.275 0.7837
#> cohort1962 23.88102 99.43981 0.240 0.8102
#> cohort1963 21.96044 101.38432 0.217 0.8285
#> cohort1964 22.59980 99.85004 0.226 0.8209
#> cohort1965 35.39070 84.69657 0.418 0.6761
#> cohort1966 52.19842 93.33271 0.559 0.5760
#> cohort1967 54.75169 94.88410 0.577 0.5639
#> cohort1968 29.30687 95.96987 0.305 0.7601
#> cohort1969 -16.64207 93.55918 -0.178 0.8588
#> cohort1970 -13.15350 93.01375 -0.141 0.8875
#> cohort1971 -9.63131 85.70810 -0.112 0.9105
#> cohort1972 -94.85714 99.17764 -0.956 0.3389
#> cohort1973 -129.92048 102.45307 -1.268 0.2048
#> cohort1974 -122.17291 98.75499 -1.237 0.2160
#> cohort1975 -33.97887 84.09864 -0.404 0.6862
#> cohort1976 0.05253 95.53073 0.001 0.9996
#> cohort1977 26.80063 98.50027 0.272 0.7856
#> cohort1978 8.19420 98.95543 0.083 0.9340
#> cohort1979 -28.14974 94.99078 -0.296 0.7670
#> cohort1980 -18.22349 95.74013 -0.190 0.8490
#> cohort1981 -19.22080 93.68167 -0.205 0.8374
#> cohort1982 -61.15087 98.26652 -0.622 0.5338
#> cohort1983 -89.94060 99.31063 -0.906 0.3651
#> cohort1984 -84.16653 96.09542 -0.876 0.3811
#> cohort1985 -35.15202 83.97337 -0.419 0.6755
#> cohort1986 23.02419 101.33499 0.227 0.8203
#> cohort1987 52.56698 106.34666 0.494 0.6211
#> cohort1988 42.80289 106.57370 0.402 0.6880
#> cohort1989 16.05302 102.94645 0.156 0.8761
#> cohort1990 26.81317 104.40628 0.257 0.7973
#> cohort1991 33.33100 102.69174 0.325 0.7455
#> cohort1992 -12.38976 107.04939 -0.116 0.9079
#> cohort1993 -42.05786 108.45276 -0.388 0.6982
#> cohort1994 -37.15501 105.55691 -0.352 0.7248
#> cohort1995 -5.11853 93.50591 -0.055 0.9563
#> cohort1996 66.76549 116.98898 0.571 0.5682
#> cohort1997 88.59953 124.53294 0.711 0.4768
#> cohort1998 76.83842 125.30561 0.613 0.5397
#> cohort1999 71.26135 121.94267 0.584 0.5590
#> cohort2000 78.44310 123.67906 0.634 0.5259
#> cohort2001 79.48047 123.21747 0.645 0.5189
#> cohort2002 40.79277 127.30436 0.320 0.7486
#> cohort2003 18.08633 128.90684 0.140 0.8884
#> cohort2004 15.46515 127.77105 0.121 0.9037
#> cohort2005 NA NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 889.6 on 27692 degrees of freedom
#> Multiple R-squared: 0.03193, Adjusted R-squared: 0.02749
#> F-statistic: 7.193 on 127 and 27692 DF, p-value: < 2.2e-16
Use package broom to constructs a data frame that summarizes the model’s statistical findings.
apc_tidy <- tidy(apc_lm)
apc_tidy
#> # A tibble: 128 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -339. 86.8 -3.91 9.40e- 5
#> 2 age 30.4 2.32 13.1 5.11e-39
#> 3 age_squared -0.323 0.0217 -14.9 7.18e-50
#> 4 period1986 -15.3 76.2 -0.200 8.41e- 1
#> 5 period1987 -38.5 78.4 -0.491 6.23e- 1
#> 6 period1988 -6.44 77.8 -0.0828 9.34e- 1
#> 7 period1989 66.9 73.3 0.914 3.61e- 1
#> 8 period1990 60.1 74.2 0.810 4.18e- 1
#> 9 period1991 67.5 71.9 0.939 3.48e- 1
#> 10 period1992 118. 75.2 1.57 1.17e- 1
#> # ... with 118 more rows
From tidied coefficients, we need to tidy our results, starting from period and cohort first:
pc_tidy <- apc_tidy %>%
mutate(
key = str_replace_all(term, "[:digit:]", ""),
value = str_replace_all(term, "[:alpha:]", "") %>% as.numeric()
) %>% # seperate digit number and alphabet of term
select(key, value, estimate) %>%
filter(key == "period" | key == "cohort") %>%
spread(key, value)
pc_tidy
#> # A tibble: 125 x 3
#> estimate cohort period
#> <dbl> <dbl> <dbl>
#> 1 -177. NA 2016
#> 2 -130. 1973 NA
#> 3 -122. 1974 NA
#> 4 -114. 1918 NA
#> 5 -112. 1919 NA
#> 6 -94.9 1972 NA
#> 7 -89.9 1983 NA
#> 8 -84.2 1984 NA
#> 9 -82.8 1917 NA
#> 10 -62.2 1928 NA
#> # ... with 115 more rows
Then we can extract and tidy each coefficient using code in the chunk below:
age_coef <- apc_tidy %>%
filter(term == "age") %>%
pull(estimate)
age_squared_coef <- apc_tidy %>%
filter(term == "age_squared") %>%
pull(estimate)
period_coef <- pc_tidy %>%
select(x = period, y = estimate) %>%
drop_na()
cohort_coef <- pc_tidy %>%
select(x = cohort, y = estimate) %>%
drop_na()
Now we can calculate the age effect, then combine all the coefficient together:
apc_all <- suicides %>%
mutate(age_effect = (age * age_coef) + (age_squared * age_squared_coef)) %>%
select(x = age, y = age_effect) %>%
distinct() %>%
mutate(group = "Age Effect") %>%
bind_rows(mutate(period_coef, group = "Period Effect")) %>%
bind_rows(mutate(cohort_coef, group = "Cohort Effect")) %>%
select(group, x, y) %>%
arrange(x)
apc_all
#> # A tibble: 131 x 3
#> group x y
#> <chr> <dbl> <dbl>
#> 1 Age Effect 10 272.
#> 2 Age Effect 20 479.
#> 3 Age Effect 30 622.
#> 4 Age Effect 44 713.
#> 5 Age Effect 64 623.
#> 6 Age Effect 75 464.
#> 7 Cohort Effect 1911 22.6
#> 8 Cohort Effect 1912 44.1
#> 9 Cohort Effect 1913 26.9
#> 10 Cohort Effect 1914 -24.1
#> # ... with 121 more rows
The last but not least, we can visualize the estimated coefficient to get some insight regarding our APC decomposition:
ggplot(apc_all, aes(x, y)) +
geom_line() +
facet_wrap(~ group, scales = "free", ncol = 1)
From plot above we can identify Age, Period and Cohort effect of suicides. But it’s not capturing all of our the data dynamics, lets try it to more specific cases from the data. We filltered the data by sex and country:
suicides_male_us <- suicides %>%
filter(
sex == "male",
country == "United States"
)
suicides_male_us
#> # A tibble: 186 x 7
#> suicides age age_squared period cohort sex country
#> <dbl> <dbl> <dbl> <fct> <fct> <chr> <chr>
#> 1 2177 75 5625 1985 1910 male United States
#> 2 5302 64 4096 1985 1921 male United States
#> 3 5134 30 900 1985 1955 male United States
#> 4 6053 44 1936 1985 1941 male United States
#> 5 4267 20 400 1985 1965 male United States
#> 6 205 10 100 1985 1975 male United States
#> 7 2340 75 5625 1986 1911 male United States
#> 8 5500 64 4096 1986 1922 male United States
#> 9 5450 30 900 1986 1956 male United States
#> 10 6450 44 1936 1986 1942 male United States
#> # ... with 176 more rows
Now we have our data that contained APC variables of male
in United States
. For the next step we applied linear regression to extract coefficient of APC from the data :
apc_lm <- lm(suicides ~ age + age_squared + period + cohort, suicides_male_us)
summary(apc_lm)
#>
#> Call:
#> lm(formula = suicides ~ age + age_squared + period + cohort,
#> data = suicides_male_us)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2474.8 -378.9 0.0 439.4 2474.8
#>
#> Coefficients: (1 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -5925.2423 1716.8546 -3.451 0.00104 **
#> age 433.4174 47.4850 9.127 7.01e-13 ***
#> age_squared -4.3385 0.4684 -9.262 4.19e-13 ***
#> period1986 -234.8724 1515.8754 -0.155 0.87740
#> period1987 -377.2245 1590.8747 -0.237 0.81339
#> period1988 -114.9018 1562.8255 -0.074 0.94164
#> period1989 332.2516 1468.4095 0.226 0.82178
#> period1990 399.7556 1534.8937 0.260 0.79543
#> period1991 329.7095 1471.5206 0.224 0.82348
#> period1992 912.2598 1591.6599 0.573 0.56872
#> period1993 1023.7514 1593.4089 0.642 0.52304
#> period1994 787.2440 1522.2825 0.517 0.60699
#> period1995 478.5536 1170.8616 0.409 0.68423
#> period1996 -471.4333 1372.8457 -0.343 0.73252
#> period1997 -606.9015 1555.5320 -0.390 0.69783
#> period1998 -413.7255 1535.9278 -0.269 0.78859
#> period1999 220.0764 1299.5798 0.169 0.86611
#> period2000 238.8850 1451.0640 0.165 0.86980
#> period2001 435.6227 1441.6270 0.302 0.76358
#> period2002 1223.3076 1531.3070 0.799 0.42757
#> period2003 1498.8270 1533.8367 0.977 0.33247
#> period2004 1457.0321 1471.2956 0.990 0.32607
#> period2005 827.0042 1150.5815 0.719 0.47512
#> period2006 331.3115 1484.2611 0.223 0.82414
#> period2007 260.8998 1593.2016 0.164 0.87048
#> period2008 565.8727 1603.4460 0.353 0.72541
#> period2009 1053.6846 1402.4886 0.751 0.45546
#> period2010 1262.4640 1529.5100 0.825 0.41247
#> period2011 1440.0666 1551.2895 0.928 0.35703
#> period2012 1983.7952 1616.2291 1.227 0.22454
#> period2013 2322.9728 1611.8220 1.441 0.15481
#> period2014 2446.7770 1590.2011 1.539 0.12923
#> period2015 2279.9186 1191.5563 1.913 0.06055 .
#> cohort1911 397.8724 2687.2179 0.148 0.88280
#> cohort1912 732.2245 2720.6192 0.269 0.78876
#> cohort1913 485.9018 2694.6122 0.180 0.85752
#> cohort1914 -47.2516 2631.0384 -0.018 0.98573
#> cohort1915 76.2444 2658.8832 0.029 0.97722
#> cohort1916 166.2905 2612.8067 0.064 0.94947
#> cohort1917 -516.2598 2672.5255 -0.193 0.84749
#> cohort1918 -551.7514 2663.7545 -0.207 0.83662
#> cohort1919 -335.2440 2611.8202 -0.128 0.89830
#> cohort1920 -75.5536 2413.0484 -0.031 0.97513
#> cohort1921 1069.7292 1916.9792 0.558 0.57894
#> cohort1922 1343.8996 2223.3576 0.604 0.54786
#> cohort1923 1369.4876 2229.7009 0.614 0.54144
#> cohort1924 710.9253 2139.2092 0.332 0.74082
#> cohort1925 491.4443 2147.4871 0.229 0.81978
#> cohort1926 301.3235 2152.1535 0.140 0.88913
#> cohort1927 -73.9960 2140.7908 -0.035 0.97254
#> cohort1928 -657.0308 2175.1135 -0.302 0.76366
#> cohort1929 -703.8791 2143.2927 -0.328 0.74376
#> cohort1930 -363.1115 2005.5606 -0.181 0.85695
#> cohort1931 35.5800 1982.0142 0.018 0.98574
#> cohort1932 584.7793 2085.7695 0.280 0.78018
#> cohort1933 433.0270 2136.2842 0.203 0.84007
#> cohort1934 79.0331 2048.7028 0.039 0.96936
#> cohort1935 -389.7576 2015.1358 -0.193 0.84730
#> cohort1936 -512.4632 2057.2377 -0.249 0.80415
#> cohort1937 -591.6963 2059.5317 -0.287 0.77489
#> cohort1938 -943.1276 2079.7766 -0.453 0.65187
#> cohort1939 -1059.2894 2062.1911 -0.514 0.60940
#> cohort1940 -898.4627 1881.1383 -0.478 0.63469
#> cohort1941 813.6172 1744.7670 0.466 0.64270
#> cohort1942 1497.8998 2076.5481 0.721 0.47355
#> cohort1943 1863.2817 2130.2349 0.875 0.38529
#> cohort1944 1923.1338 2122.3509 0.906 0.36855
#> cohort1945 1745.1513 2019.2109 0.864 0.39094
#> cohort1946 2048.0095 2065.0405 0.992 0.32537
#> cohort1947 2233.7313 2029.5127 1.101 0.27553
#> cohort1948 2013.0919 2107.2405 0.955 0.34331
#> cohort1949 2142.7572 2099.5614 1.021 0.31162
#> cohort1950 2568.6088 2050.3064 1.253 0.21522
#> cohort1951 3032.8832 1753.2899 1.730 0.08889 .
#> cohort1952 4709.6467 2392.6906 1.968 0.05373 .
#> cohort1953 4781.1150 2472.7116 1.934 0.05797 .
#> cohort1954 4859.9390 2461.9013 1.974 0.05306 .
#> cohort1955 2934.2540 1807.8481 1.623 0.10991
#> cohort1956 3395.2859 2035.1210 1.668 0.10055
#> cohort1957 3538.5931 2032.5711 1.741 0.08691 .
#> cohort1958 3268.5893 2070.1321 1.579 0.11970
#> cohort1959 2807.2529 2047.1809 1.371 0.17548
#> cohort1960 2854.3984 2009.2820 1.421 0.16069
#> cohort1961 3308.9354 1882.3955 1.758 0.08396 .
#> cohort1962 3290.5066 2045.8841 1.608 0.11310
#> cohort1963 3496.4666 2082.0656 1.679 0.09838 .
#> cohort1964 3697.2339 2049.6374 1.804 0.07636 .
#> cohort1965 3551.8804 1665.9913 2.132 0.03718 *
#> cohort1966 3802.9070 1895.8336 2.006 0.04946 *
#> cohort1967 3697.3129 1932.0752 1.914 0.06052 .
#> cohort1968 3309.5705 1952.2085 1.695 0.09529 .
#> cohort1969 2631.5262 1901.7046 1.384 0.17164
#> cohort1970 2513.8206 1906.0440 1.319 0.19231
#> cohort1971 2644.8762 1702.5491 1.553 0.12566
#> cohort1972 931.5493 2117.8954 0.440 0.66166
#> cohort1973 786.0437 2124.1419 0.370 0.71267
#> cohort1974 1009.6949 2043.0762 0.494 0.62300
#> cohort1975 1647.3423 1676.2587 0.983 0.32974
#> cohort1976 2065.1929 1953.7135 1.057 0.29479
#> cohort1977 2202.2702 2014.7840 1.093 0.27881
#> cohort1978 1931.1130 2022.3138 0.955 0.34352
#> cohort1979 1347.1907 1949.3224 0.691 0.49221
#> cohort1980 1402.4933 1982.2140 0.708 0.48202
#> cohort1981 1372.7286 1926.4556 0.713 0.47892
#> cohort1982 791.4073 2047.8569 0.386 0.70055
#> cohort1983 558.6778 2059.7818 0.271 0.78716
#> cohort1984 735.1771 1985.3703 0.370 0.71249
#> cohort1985 1173.3694 1685.9935 0.696 0.48919
#> cohort1986 2455.1676 2162.1592 1.136 0.26075
#> cohort1987 2538.6076 2227.8080 1.140 0.25910
#> cohort1988 2333.0332 2235.9342 1.043 0.30101
#> cohort1989 1766.2263 2166.0241 0.815 0.41811
#> cohort1990 1765.9323 2206.4778 0.800 0.42672
#> cohort1991 1657.2621 2164.6230 0.766 0.44696
#> cohort1992 963.5554 2261.0318 0.426 0.67154
#> cohort1993 646.2068 2281.6992 0.283 0.77800
#> cohort1994 694.7022 2207.9443 0.315 0.75415
#> cohort1995 1236.6453 1884.6583 0.656 0.51427
#> cohort1996 1846.6071 2613.5257 0.707 0.48262
#> cohort1997 1895.0188 2670.1223 0.710 0.48068
#> cohort1998 1613.0459 2679.0297 0.602 0.54941
#> cohort1999 1146.2341 2629.8720 0.436 0.66453
#> cohort2000 949.4546 2667.0024 0.356 0.72311
#> cohort2001 785.8521 2640.8477 0.298 0.76707
#> cohort2002 267.1235 2710.4500 0.099 0.92183
#> cohort2003 -46.0542 2732.6571 -0.017 0.98661
#> cohort2004 -144.8584 2692.9176 -0.054 0.95728
#> cohort2005 NA NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1577 on 59 degrees of freedom
#> Multiple R-squared: 0.9105, Adjusted R-squared: 0.7194
#> F-statistic: 4.765 on 126 and 59 DF, p-value: 3.789e-10
Again, from tidied coefficients, we need to tidy our results:
apc_tidy <- tidy(apc_lm)
pc_tidy <- apc_tidy %>%
mutate(
key = str_replace_all(term, "[:digit:]", ""),
value = str_replace_all(term, "[:alpha:]", "") %>% as.numeric()
) %>% # seperate digit number and alphabet of term
select(key, value, estimate) %>%
filter(key == "period" | key == "cohort") %>%
spread(key, value)
age_coef <- apc_tidy %>%
filter(term == "age") %>%
pull(estimate)
age_squared_coef <- apc_tidy %>%
filter(term == "age_squared") %>%
pull(estimate)
period_coef <- pc_tidy %>%
select(x = period, y = estimate) %>%
drop_na()
cohort_coef <- pc_tidy %>%
select(x = cohort, y = estimate) %>%
drop_na()
Now we can calculate the age effect, then combine all the coefficient together:
apc_all_male_us <- suicides_male_us %>%
mutate(age_effect = (age * age_coef) + (age_squared * age_squared_coef)) %>%
select(x = age, y = age_effect) %>%
distinct() %>%
mutate(group = "Age Effect") %>%
bind_rows(mutate(period_coef, group = "Period Effect")) %>%
bind_rows(mutate(cohort_coef, group = "Cohort Effect")) %>%
select(group, x, y) %>%
arrange(x)
apc_all_male_us
#> # A tibble: 130 x 3
#> group x y
#> <chr> <dbl> <dbl>
#> 1 Age Effect 10 3900.
#> 2 Age Effect 20 6933.
#> 3 Age Effect 30 9098.
#> 4 Age Effect 44 10671.
#> 5 Age Effect 64 9968.
#> 6 Age Effect 75 8102.
#> 7 Cohort Effect 1911 398.
#> 8 Cohort Effect 1912 732.
#> 9 Cohort Effect 1913 486.
#> 10 Cohort Effect 1914 -47.3
#> # ... with 120 more rows
The last we visualize the estimated coefficient to get some insight regarding our APC decomposition:
ggplot(apc_all_male_us, aes(x, y)) +
geom_line() +
facet_wrap(~ group, scales = "free", ncol = 1)
dive deeper
Create APC analysis, try to compare the APC of female
in United States
with the APC we get above:
suicides_female_us <- suicides %>%
filter(
sex == "female",
country == "United States"
)
suicides_female_us
Annotations
- Yang Y, Schulhofer‐Wohl S, Fu WJ, Land KC. The Intrinsic Estimator for Age‐Period‐Cohort Analysis: What It Is and How to Use It1. American Journal of Sociology 2008;113(6):1697-736.
[return] - Mason, Karen Oppenheim, et al. “Some methodological issues in cohort analysis of archival data.” American sociological review (1973): 242-258
[return] - Keyes KM, Utz RL, Robinson W, Li G. What is a cohort effect? Comparison of three statistical methods for modeling cohort effects in obesity prevalence in the United States, 1971-2006. Soc Sci Med 2010;70(7):1100-8
[return]