The American National Election Studies (ANES) release survey data every four years that ask participants a lengthy battery of questions. It is well known as being one of the most scientifically rigorous political surveys administered. Let’s load the data.

a <- read.csv("C:/Users/Chris/Desktop/RWorking/anes2016.csv")

GLM w/ continuous and categorical predictor.

First, I’m going to change the variable names and set them as continuous and categorical.

a1$clinton <- as.numeric(a1$V162078)
a1$limbaugh <- as.factor(a1$V161428)
a1$age <- as.numeric(a1$V161267)

Perhaps one of the more well-known metrics, the ANES survey measures participant attitudes towards presidential candidates. Known as the “Feeling Thermometer”, the scale goes from 0 to 100.

As you can see, I’m using the feeling thermometer for Hillary Clinton in the 2016 US Presidential Election as our dependent variable, listening to Rush Limbaugh as an independent categorical variable, and the age of the respondent as an independent continuous variable.

Quickly, I’m going to mean center the age variable.

a1$age_c <- a1$age - mean (a1$age)

Checking residuals

boxplot(a1$age_c)

boxplot(a1$limbaugh)

boxplot(a1$clinton)

a2 <- lm(clinton ~ limbaugh*age_c, data=a1)
layout(matrix(1, 2, 2))
residualPlots(a2) 

##            Test stat Pr(>|Test stat|)
## limbaugh                             
## age_c         0.7195           0.4719
## Tukey test    0.1045           0.9167

My Limbaugh variable is dichotomous… that is, people either listened to Limbaugh or they didn’t listen to him. As you can see, in relation to my Clinton variable, my Limbaugh listeners are quite strangely distributed. (The non-Limbaugh listeners appear to be more normal, no pun intended.)

We will have to make some attempt to address that.

a2res <- stdres(a2)
layout(matrix(1, 2, 2))
hist(a2res, freq = FALSE)
curve(dnorm, add = TRUE)

mean.a2res<-mean(a2res)
print(mean.a2res)
## [1] 6.835472e-06

Ideally we would like our residuals to resemble a normal distribution a bit more than they do, but my mean of residuals is very close to zero, and I don’t have any residuals +/- 3 SDs.

layout(matrix(1, 2, 2))
qqPlot(a2, id.n=8) 

## 2285 3319 
## 1062 1521
par(mfcol=c(2,2))
plot(a2)

The qq-plot is actually not as bad as I would have expected, and my leverage plot isn’t too terrible, either.

layout(matrix(1, 2, 2))
influenceIndexPlot((a2), id.n=8) 

I have some outliers, but I’m not overly concerned about outlier influence as my sample size is quite large.

inflm.a2 <- influence.measures(a2)
which(apply(inflm.a2$is.inf, 1, any))
##    6    8   25   88  159  177  194  207  225  264  282  286  312  327  329  434 
##    4    6   18   53   84   94  102  109  120  131  142  145  158  166  167  218 
##  435  478  501  515  518  531  532  548  555  557  611  629  633  644  693  698 
##  219  239  251  258  260  268  269  278  282  284  309  316  318  324  344  346 
##  720  741  755  759  763  770  829  841  859  866  889  906  924  955  962  963 
##  357  367  378  380  383  384  413  419  426  431  441  452  458  474  479  480 
##  988  989  996 1005 1021 1025 1040 1048 1110 1116 1133 1144 1156 1169 1193 1229 
##  488  489  492  495  508  511  519  524  549  554  560  565  573  576  591  607 
## 1265 1271 1275 1295 1298 1371 1372 1389 1394 1413 1457 1478 1494 1527 1545 1556 
##  621  623  626  632  634  660  661  670  671  678  697  706  714  727  735  740 
## 1560 1562 1586 1614 1644 1645 1649 1669 1678 1705 1711 1731 1744 1758 1759 1828 
##  741  742  752  763  778  779  781  789  793  807  810  823  831  840  841  869 
## 1936 1963 1998 2011 2031 2078 2110 2120 2123 2162 2180 2187 2193 2199 2254 2261 
##  908  920  933  939  949  972  986  990  992 1008 1019 1024 1027 1029 1048 1052 
## 2285 2289 2304 2314 2325 2337 2364 2410 2430 2438 2456 2481 2512 2535 2546 2592 
## 1062 1064 1071 1078 1084 1092 1104 1126 1135 1137 1142 1149 1162 1171 1180 1202 
## 2596 2603 2610 2661 2662 2674 2716 2758 2761 2778 2789 2838 2847 2860 2872 2889 
## 1204 1210 1214 1238 1239 1245 1259 1275 1278 1285 1289 1304 1307 1314 1319 1326 
## 2920 2966 2994 3004 3008 3010 3013 3020 3052 3056 3078 3157 3176 3183 3195 3218 
## 1337 1352 1366 1371 1373 1375 1377 1379 1396 1398 1405 1444 1455 1457 1464 1474 
## 3220 3270 3294 3312 3319 3329 3338 3346 3360 3376 3406 3422 3424 3443 3482 3491 
## 1476 1499 1508 1517 1521 1523 1526 1529 1537 1544 1555 1561 1562 1572 1590 1594 
## 3503 3527 3589 3591 3618 3629 3663 3665 3710 3727 3760 3800 3828 3854 3873 3892 
## 1601 1612 1637 1638 1648 1652 1667 1668 1686 1694 1713 1727 1742 1753 1764 1774 
## 3914 3918 3943 3954 3968 4024 4040 4064 4078 4080 4088 4121 4167 4201 4202 4225 
## 1782 1783 1795 1802 1809 1827 1835 1843 1849 1851 1854 1867 1884 1893 1894 1903 
## 4237 4238 
## 1909 1910
#summary(inflm.a2)

R found 185 influential measures. That’s not quite 10% of our sample. I’m going to try some transformations on the Clinton variable to see if I can clean up the residuals a little bit.

par(mfcol=c(2,2))
boxplot(a1$clinton)
boxplot(log10(.0001+a1$clinton))
boxplot(sqrt(a1$clinton))
boxplot(1/(.0001+a1$clinton))

It looks like the square root transformation (top right) is pretty good. Let’s test it in comparison to our Limbaugh variable which seems to be having the most trouble.

par(mfcol=c(1,2))
boxplot(a1$clinton~a1$limbaugh)
boxplot(sqrt(a1$clinton)~a1$limbaugh)

The square root transformation does appear to be better. It’s not perfect, but data rarely is! Let’s check assumptions again.

a1$clinroot <- sqrt(a1$clinton)
a3 <- lm(clinroot ~ limbaugh*age_c, data=a1)
layout(matrix(1, 2, 2))
residualPlots(a3) 

##            Test stat Pr(>|Test stat|)
## limbaugh                             
## age_c         0.3640           0.7159
## Tukey test    0.1554           0.8765

This helped my Limbaugh listeners a lot. There appear to be some floor effects in my overall regression model.

a3res <- stdres(a3)
layout(matrix(1, 2, 2))
hist(a3res, freq = FALSE)
curve(dnorm, add = TRUE)

mean.a3res<-mean(a3res)
print(mean.a3res)
## [1] 5.161885e-06

The distribution of residuals is slightly better, but still not great. Mean of residuals is still basically zero.

layout(matrix(1, 2, 1))
qqPlot(a3, id.n=8) 

## 2285 2481 
## 1062 1149
par(mfcol=c(2,2))
plot(a3)

The qq-plot is pretty messy.

layout(matrix(1, 2, 2))

influenceIndexPlot((a3)) 

There are still some outliers, obviously. Again, I bet our sample size is resilient enough to see some effects even with negative outlier influence.

inflm.a3 <- influence.measures(a3)
which(apply(inflm.a3$is.inf, 1, any))
##    8   88  159  177  194  225  264  282  286  312  327  329  434  435  478  501 
##    6   53   84   94  102  120  131  142  145  158  166  167  218  219  239  251 
##  515  531  532  548  555  557  611  629  633  693  698  720  741  755  759  763 
##  258  268  269  278  282  284  309  316  318  344  346  357  367  378  380  383 
##  770  841  889  906  924  955  962  963  988  989  996 1005 1021 1025 1040 1048 
##  384  419  441  452  458  474  479  480  488  489  492  495  508  511  519  524 
## 1110 1116 1133 1144 1156 1193 1229 1265 1271 1275 1295 1298 1371 1372 1413 1457 
##  549  554  560  565  573  591  607  621  623  626  632  634  660  661  678  697 
## 1478 1494 1527 1545 1556 1560 1562 1586 1644 1645 1649 1669 1678 1705 1711 1731 
##  706  714  727  735  740  741  742  752  778  779  781  789  793  807  810  823 
## 1744 1758 1759 1828 1936 1998 2031 2110 2120 2123 2180 2187 2193 2199 2254 2285 
##  831  840  841  869  908  933  949  986  990  992 1019 1024 1027 1029 1048 1062 
## 2289 2304 2314 2325 2337 2364 2410 2430 2438 2456 2481 2512 2535 2546 2592 2603 
## 1064 1071 1078 1084 1092 1104 1126 1135 1137 1142 1149 1162 1171 1180 1202 1210 
## 2610 2661 2662 2674 2761 2778 2789 2838 2847 2860 2872 2889 2920 2966 3008 3010 
## 1214 1238 1239 1245 1278 1285 1289 1304 1307 1314 1319 1326 1337 1352 1373 1375 
## 3013 3020 3052 3056 3078 3157 3176 3195 3196 3270 3294 3319 3329 3346 3360 3406 
## 1377 1379 1396 1398 1405 1444 1455 1464 1465 1499 1508 1521 1523 1529 1537 1555 
## 3422 3482 3503 3527 3591 3618 3629 3663 3665 3710 3727 3760 3800 3828 3843 3854 
## 1561 1590 1601 1612 1638 1648 1652 1667 1668 1686 1694 1713 1727 1742 1749 1753 
## 3873 3892 3918 3943 3954 3968 4040 4064 4078 4080 4121 4167 4201 4202 4225 4237 
## 1764 1774 1783 1795 1802 1809 1835 1843 1849 1851 1867 1884 1893 1894 1903 1909 
## 4238 
## 1910
#summary(inflm.a3) 

R found a fair amount less influential measures. That having been said, our sample size is large enough that it can take some outliers.

I think I’m going to stick with the square root transformation. The distribution of residuals is not so skewed, although admittedly it still doesn’t represent normality.

Anova(a3, contrasts=list(limbaugh=contr.sum, age_c=contr.sum), type=3)
## Anova Table (Type III tests)
## 
## Response: clinroot
##                Sum Sq   Df   F value  Pr(>F)    
## (Intercept)     59470    1 5508.1811 < 2e-16 ***
## limbaugh         1523    1  141.0528 < 2e-16 ***
## age_c               5    1    0.4557 0.49973    
## limbaugh:age_c     43    1    3.9423 0.04723 *  
## Residuals       20708 1918                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(a3)
## 
## Call:
## lm(formula = clinroot ~ limbaugh * age_c, data = a1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.017 -2.101  0.936  2.509  7.770 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      5.920083   0.079767  74.217   <2e-16 ***
## limbaugh1       -3.070863   0.258565 -11.877   <2e-16 ***
## age_c           -0.003270   0.004844  -0.675   0.4997    
## limbaugh1:age_c -0.028908   0.014560  -1.986   0.0472 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.286 on 1918 degrees of freedom
## Multiple R-squared:  0.09626,    Adjusted R-squared:  0.09485 
## F-statistic:  68.1 on 3 and 1918 DF,  p-value: < 2.2e-16

Limbaugh listening is significant, and the interaction between limbaugh listening and age is also significant. The model accounts for 10% of the variance, and the overall model is significant.

(a3.means <- interactionMeans(a3))
##   limbaugh adjusted mean std. error
## 1        0      5.920083 0.07976708
## 2        1      2.849220 0.24595328

Here are the means for the model at the two Limbaugh conditions.

a3res <- lmres(clinroot ~ limbaugh*age_c, data=a1)
simSlopes <- simpleSlope(a3res, pred="age_c", mod1="limbaugh1") 
summary(simSlopes) 
## 
## ** Estimated points of clinroot  **
## 
##                        Low age_c (-1 SD) High age_c (+1 SD)
## Low limbaugh1 (-1 SD)             6.5025             6.5898
## High limbaugh1 (+1 SD)            4.8563             4.3320
## 
## 
## 
## ** Simple Slopes analysis ( df= 1918 ) **
## 
##                        simple slope standard error t-value p.value  
## Low limbaugh1 (-1 SD)        0.0026         0.0065    0.41   0.685  
## High limbaugh1 (+1 SD)      -0.0158         0.0065   -2.41   0.016 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## 
## ** Bauer & Curran 95% CI **
## 
##           lower CI upper CI
## limbaugh1  -17.514   0.2125

This simple slopes analysis highlights my interaction. For those individuals who aren’t Limbaugh listeners, age has no effect on Clinton-liking. However, for those individuals who listen to the ’baugh, scores on the Clinton thermometer decrease as age increases. Here’s a plot that visualizes the results!

graph <- ggplot(a1, aes(age_c, clinroot,color=limbaugh))+stat_smooth(method=lm) + 
      ggtitle("Age and Limbaugh Listening as Factors \n in Hillary Clinton Feeling Thermometer") +
      scale_x_continuous(name = "Age (Centered)") +
      scale_y_continuous(name = "Clinton Feeling Thermometer") +
         theme(axis.line.x = element_line(linewidth=.5, colour = "black"),
            axis.line.y = element_line(linewidth=.5, colour = "black"),
            axis.text.x=element_text(colour="black", size = 9),
            axis.text.y=element_text(colour="black", size = 9),
            panel.grid.major = element_line(colour = "#d3d3d3"),
            panel.grid.minor = element_blank(),
            panel.border = element_blank(), panel.background = element_blank(),
            plot.title = element_text(size = 14, family = "serif", face = "bold", hjust = .5),
            text=element_text(family="serif")) 

graph <- graph + scale_colour_discrete(name="Limbaugh\nListener",
      breaks=c("0", "1"),
                       labels=c("No", "Yes"))

graph

As you can see, the plot suggests that Age is a significant factor in Clinton-liking for Limbaugh listeners but not for non-Limbaugh listeners. Pretty cool!

ANCOVA - Adding Participant Sex

I’m going to use the same model as before, but this time I’m going to account for sex as a covariate. Introducing the new variable in the dataset.

b1$clinton <- as.numeric(b1$V162078)
b1$limbaugh <- as.factor(b1$V161428)
b1$age <- as.numeric(b1$V161267)
b1$sex <- as.factor(b1$V161342)

Variables are renamed and set as categorical or continuous.

Centering age.

b1$age_c <- b1$age - mean (b1$age)

Checking residuals

I’m going to go through the same steps to check residuals.

ba <- lm(clinton ~ limbaugh*age_c*sex, data=b1)
layout(matrix(1, 2, 2))
residualPlots(ba) 

##            Test stat Pr(>|Test stat|)
## limbaugh                             
## age_c         0.6657           0.5057
## sex                                  
## Tukey test    1.4449           0.1485

My Limbaugh listeners are distributed oddly, as expected.

bares <- stdres(ba)
layout(matrix(1, 2, 2))
hist(bares, freq = FALSE)
curve(dnorm, add = TRUE)

mean.bares<-mean(bares)
print(mean.bares)
## [1] 3.24306e-05

This distribution of residuals is off, the mean is approaching zero.

layout(matrix(1, 2, 2))
qqPlot(ba, id.n=8) 

## 2285 2481 
## 1056 1142
par(mfcol=c(2,2))
plot(ba)

The qq-plot looks a little rough, though I’ve seen worse.

layout(matrix(1, 2, 2))
influenceIndexPlot((ba), id.n=8) 

There are definitely some outliers here. If my sample size were smaller, I might consider removing some of these in an outlier sweep. As it is, I don’t think my overall findings will be too affected by these points.

inflm.ba <- influence.measures(ba)
which(apply(inflm.ba$is.inf, 1, any))
##    8   88  112  159  177  194  207  264  286  312  327  434  435  478  501  515 
##    6   53   64   83   93  101  108  130  144  157  164  216  217  237  249  256 
##  531  532  548  555  557  611  629  644  693  720  741  759  763  770  829  841 
##  265  266  275  279  281  305  312  319  339  352  362  375  378  379  408  414 
##  859  889  906  924  955  962  963  988  989  996 1005 1021 1025 1040 1110 1116 
##  421  436  447  453  469  474  475  483  484  487  490  503  506  514  544  549 
## 1133 1144 1156 1193 1229 1265 1271 1275 1295 1371 1372 1389 1478 1494 1510 1527 
##  555  560  568  586  602  616  618  621  627  654  655  664  700  708  713  721 
## 1545 1556 1560 1562 1586 1644 1649 1669 1678 1705 1711 1731 1744 1758 1759 1828 
##  729  734  735  736  746  772  775  783  787  801  804  817  825  834  835  863 
## 1936 1998 2031 2110 2120 2123 2187 2193 2199 2254 2285 2289 2304 2314 2325 2364 
##  902  927  943  980  984  986 1018 1021 1023 1042 1056 1058 1065 1071 1077 1097 
## 2410 2430 2456 2481 2512 2535 2546 2592 2603 2610 2662 2674 2716 2758 2761 2778 
## 1119 1128 1135 1142 1155 1164 1173 1195 1203 1207 1232 1238 1252 1268 1271 1278 
## 2838 2847 2860 2872 2920 2966 2994 3004 3008 3013 3020 3052 3056 3078 3157 3176 
## 1297 1300 1307 1312 1330 1345 1359 1364 1366 1370 1372 1389 1391 1398 1437 1448 
## 3195 3196 3218 3270 3294 3319 3329 3338 3346 3360 3406 3422 3443 3482 3503 3589 
## 1457 1458 1467 1491 1500 1513 1515 1518 1521 1529 1547 1553 1564 1582 1593 1627 
## 3618 3629 3665 3700 3727 3760 3800 3828 3854 3873 3892 3918 3943 3954 3968 4024 
## 1638 1642 1657 1670 1683 1702 1716 1731 1742 1753 1763 1772 1784 1791 1798 1816 
## 4040 4064 4078 4080 4088 4121 4167 4201 4202 4222 4225 4238 
## 1824 1832 1838 1840 1843 1856 1873 1882 1883 1891 1892 1899
#summary(inflm.ba) 

R found 189 influential measures. Let’s perform that root transformation on my Clinton variable and see if that helps things on my assumptions.

b1$clinroot <- sqrt(b1$clinton)
bb <- lm(clinroot ~ limbaugh*age_c*sex, data=b1)
layout(matrix(1, 2, 2))
residualPlots(bb) 

##            Test stat Pr(>|Test stat|)
## limbaugh                             
## age_c         0.3343           0.7382
## sex                                  
## Tukey test    1.4606           0.1441

Our Limbaugh listeners are looking much healthier!

bbres <- stdres(bb)
layout(matrix(1, 2, 2))
hist(bbres, freq = FALSE)
curve(dnorm, add = TRUE)

mean.bbres<-mean(bbres)
print(mean.bbres)
## [1] 2.904784e-05

This histogram looks great compared to some of the models I’ve made with this dataset! (It’s still a little rough.)

par(mfcol=c(1,1))
qqPlot(bb, id.n=8) 

## 2285 2481 
## 1056 1142
par(mfcol=c(2,2))
plot(bb)

The qq-plot is of course not great. I’ve got some skew in my residual vs fitted but I think the fanning is not too bad, suggesting homoscedasticity.

layout(matrix(1, 2, 2))
influenceIndexPlot((bb), id.n=8) 

Obviously still have outliers.

inflm.bb <- influence.measures(bb)
which(apply(inflm.bb$is.inf, 1, any))
##    8   88  112  159  177  194  207  286  312  327  434  435  478  501  515  532 
##    6   53   64   83   93  101  108  144  157  164  216  217  237  249  256  266 
##  555  557  611  629  644  693  720  759  763  770  829  841  859  889  906  924 
##  279  281  305  312  319  339  352  375  378  379  408  414  421  436  447  453 
##  955  962  963  988  989  996 1005 1021 1025 1040 1110 1116 1133 1144 1156 1271 
##  469  474  475  483  484  487  490  503  506  514  544  549  555  560  568  618 
## 1275 1295 1371 1372 1478 1494 1510 1527 1545 1556 1560 1562 1586 1644 1669 1678 
##  621  627  654  655  700  708  713  721  729  734  735  736  746  772  783  787 
## 1705 1711 1731 1744 1758 1759 1828 1936 1998 2031 2110 2120 2123 2187 2193 2199 
##  801  804  817  825  834  835  863  902  927  943  980  984  986 1018 1021 1023 
## 2254 2285 2289 2304 2314 2325 2364 2410 2430 2456 2481 2535 2546 2592 2603 2610 
## 1042 1056 1058 1065 1071 1077 1097 1119 1128 1135 1142 1164 1173 1195 1203 1207 
## 2662 2716 2758 2761 2778 2838 2847 2860 2872 2920 2966 2994 3004 3008 3013 3020 
## 1232 1252 1268 1271 1278 1297 1300 1307 1312 1330 1345 1359 1364 1366 1370 1372 
## 3052 3056 3078 3157 3176 3195 3196 3218 3270 3294 3319 3329 3338 3346 3360 3406 
## 1389 1391 1398 1437 1448 1457 1458 1467 1491 1500 1513 1515 1518 1521 1529 1547 
## 3422 3443 3482 3503 3589 3618 3629 3665 3700 3727 3760 3800 3828 3854 3873 3892 
## 1553 1564 1582 1593 1627 1638 1642 1657 1670 1683 1702 1716 1731 1742 1753 1763 
## 3918 3943 3954 3968 4024 4040 4078 4080 4088 4121 4167 4201 4202 4222 4225 4238 
## 1772 1784 1791 1798 1816 1824 1838 1840 1843 1856 1873 1882 1883 1891 1892 1899
#summary(inflm.bb) 

R found 37 less influential measures.

Sticking with the root transformation. Okay, here’s my original model.

b2 <- lm(clinroot ~ limbaugh*age_c, data=b1)
Anova(b2, type = 3)
## Anova Table (Type III tests)
## 
## Response: clinroot
##                Sum Sq   Df   F value  Pr(>F)    
## (Intercept)     58898    1 5445.4044 < 2e-16 ***
## limbaugh         1513    1  139.8506 < 2e-16 ***
## age_c               6    1    0.5763 0.44786    
## limbaugh:age_c     41    1    3.8185 0.05084 .  
## Residuals       20626 1907                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(b2)
## 
## Call:
## lm(formula = clinroot ~ limbaugh * age_c, data = b1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0207 -2.1023  0.8645  2.5216  7.7700 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      5.910817   0.080100  73.793   <2e-16 ***
## limbaugh1       -3.061551   0.258886 -11.826   <2e-16 ***
## age_c           -0.003692   0.004864  -0.759   0.4479    
## limbaugh1:age_c -0.028486   0.014578  -1.954   0.0508 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.289 on 1907 degrees of freedom
## Multiple R-squared:  0.09615,    Adjusted R-squared:  0.09473 
## F-statistic: 67.62 on 3 and 1907 DF,  p-value: < 2.2e-16

Apparently the removal of some of the participants who didn’t answer sex with a 1 or a 2 has caused my interaction to become only marginally significant. We should check to see if my Clinton thermometer even has a relationship with sex.

b3 <- lm(clinroot ~ sex, data=b1)
Anova(b3, type = 3)
## Anova Table (Type III tests)
## 
## Response: clinroot
##              Sum Sq   Df  F value    Pr(>F)    
## (Intercept) 21881.5    1 1883.420 < 2.2e-16 ***
## sex           641.6    1   55.226 1.612e-13 ***
## Residuals   22178.7 1909                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Well that’s a resounding yes! Sex is highly related to scores on the Clinton thermometer. Let’s include sex in the first linear model equation as a main effect.

b4 <- lm(clinroot ~ sex + limbaugh*age_c, data=b1)
Anova(b4, type = 3)
## Anova Table (Type III tests)
## 
## Response: clinroot
##                 Sum Sq   Df   F value    Pr(>F)    
## (Intercept)    23612.5    1 2225.0993 < 2.2e-16 ***
## sex              399.9    1   37.6875 1.008e-09 ***
## limbaugh        1301.0    1  122.6012 < 2.2e-16 ***
## age_c              4.1    1    0.3828   0.53616    
## limbaugh:age_c    51.9    1    4.8953   0.02705 *  
## Residuals      20226.3 1906                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When controlling for sex, the interaction has become significant again. Let’s see if sex confounds with age in our sample. I have no reason to think that it would, but better safe than sorry.

b5 <- lm(age_c ~ sex, data=b1)
Anova(b5, type = 3)
## Anova Table (Type III tests)
## 
## Response: age_c
##             Sum Sq   Df F value Pr(>F)
## (Intercept)    284    1  1.0242 0.3116
## sex            538    1  1.9418 0.1636
## Residuals   528651 1909
summary(b5)
## 
## Call:
## lm(formula = age_c ~ sex, data = b1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.315 -14.252  -0.252  12.748  42.748 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.5605     0.5538   1.012    0.312
## sex2         -1.0625     0.7625  -1.393    0.164
## 
## Residual standard error: 16.64 on 1909 degrees of freedom
## Multiple R-squared:  0.001016,   Adjusted R-squared:  0.0004928 
## F-statistic: 1.942 on 1 and 1909 DF,  p-value: 0.1636

It doesn’t appear so. Let’s use a logit to see if Limbaugh listening is related to sex. I am sure that it will be.

b6 <- glm(limbaugh~sex, data=b1, family=binomial(link='logit'))
summary(b6)
## 
## Call:
## glm(formula = limbaugh ~ sex, family = binomial(link = "logit"), 
##     data = b1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.5827  -0.5827  -0.4013  -0.4013   2.2623  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.68719    0.09168 -18.404  < 2e-16 ***
## sex2        -0.79129    0.14933  -5.299 1.17e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1360.7  on 1910  degrees of freedom
## Residual deviance: 1331.4  on 1909  degrees of freedom
## AIC: 1335.4
## 
## Number of Fisher Scoring iterations: 5

Indeed, Limbaugh listening is significantly related to sex. Let’s see if the interaction between sex and Limbaugh listening is significantly related to Clinton feelings.

b7 <- lm(clinroot ~ sex*limbaugh, data=b1)
Anova(b7, type = 3)
## Anova Table (Type III tests)
## 
## Response: clinroot
##               Sum Sq   Df   F value    Pr(>F)    
## (Intercept)  22086.3    1 2075.8637 < 2.2e-16 ***
## sex            390.5    1   36.7067  1.65e-09 ***
## limbaugh      1037.7    1   97.5339 < 2.2e-16 ***
## sex:limbaugh    10.6    1    0.9964    0.3183    
## Residuals    20289.7 1907                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(b7)
## 
## Call:
## lm(formula = clinroot ~ sex * limbaugh, data = b1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3494 -2.4305  0.9408  2.8702  7.5695 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      5.3837     0.1182  45.562  < 2e-16 ***
## sex2             0.9656     0.1594   6.059 1.65e-09 ***
## limbaugh1       -2.9532     0.2990  -9.876  < 2e-16 ***
## sex2:limbaugh1  -0.4862     0.4871  -0.998    0.318    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.262 on 1907 degrees of freedom
## Multiple R-squared:  0.1109, Adjusted R-squared:  0.1095 
## F-statistic: 79.28 on 3 and 1907 DF,  p-value: < 2.2e-16

The interaction between sex and Limbaugh listening is non-significant. Let’s test both interactions and see what we find.

b8 <- lm(clinroot ~ sex*limbaugh + limbaugh*age_c, data=b1)
Anova(b8, type = 3)
## Anova Table (Type III tests)
## 
## Response: clinroot
##                 Sum Sq   Df   F value    Pr(>F)    
## (Intercept)    22062.7    1 2078.5823 < 2.2e-16 ***
## sex              388.3    1   36.5819 1.758e-09 ***
## limbaugh         815.5    1   76.8312 < 2.2e-16 ***
## age_c              4.0    1    0.3751   0.54033    
## sex:limbaugh       6.0    1    0.5650   0.45236    
## limbaugh:age_c    48.5    1    4.5704   0.03266 *  
## Residuals      20220.3 1905                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(b8)
## 
## Call:
## lm(formula = clinroot ~ sex * limbaugh + limbaugh * age_c, data = b1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.4272 -2.3830  0.8959  2.8233  8.0029 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      5.382156   0.118052  45.591  < 2e-16 ***
## sex2             0.963164   0.159246   6.048 1.76e-09 ***
## limbaugh1       -2.731291   0.311601  -8.765  < 2e-16 ***
## age_c           -0.002952   0.004820  -0.612   0.5403    
## sex2:limbaugh1  -0.367395   0.488791  -0.752   0.4524    
## limbaugh1:age_c -0.031016   0.014508  -2.138   0.0327 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.258 on 1905 degrees of freedom
## Multiple R-squared:  0.1139, Adjusted R-squared:  0.1116 
## F-statistic: 48.99 on 5 and 1905 DF,  p-value: < 2.2e-16

YES! Our effect is still significant. Let’s check out the three way interaction just to make sure that there’s no higher order interaction.

b9 <- lm(clinroot ~ sex*limbaugh*age_c, data=b1)
Anova(b9, type = 3)
## Anova Table (Type III tests)
## 
## Response: clinroot
##                     Sum Sq   Df   F value    Pr(>F)    
## (Intercept)        22070.4    1 2081.2371 < 2.2e-16 ***
## sex                  380.9    1   35.9143  2.46e-09 ***
## limbaugh             848.3    1   79.9976 < 2.2e-16 ***
## age_c                  0.6    1    0.0564    0.8122    
## sex:limbaugh           0.1    1    0.0135    0.9076    
## sex:age_c              8.2    1    0.7721    0.3797    
## limbaugh:age_c         9.2    1    0.8687    0.3514    
## sex:limbaugh:age_c    19.4    1    1.8323    0.1760    
## Residuals          20180.3 1903                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(b9)
## 
## Call:
## lm(formula = clinroot ~ sex * limbaugh * age_c, data = b1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5288 -2.3369  0.9276  2.7655  7.7701 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           5.384651   0.118031  45.621  < 2e-16 ***
## sex2                  0.955372   0.159419   5.993 2.46e-09 ***
## limbaugh1            -2.852153   0.318885  -8.944  < 2e-16 ***
## age_c                 0.001701   0.007158   0.238    0.812    
## sex2:limbaugh1        0.063388   0.546323   0.116    0.908    
## sex2:age_c           -0.008504   0.009678  -0.879    0.380    
## limbaugh1:age_c      -0.017422   0.018692  -0.932    0.351    
## sex2:limbaugh1:age_c -0.040474   0.029900  -1.354    0.176    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.256 on 1903 degrees of freedom
## Multiple R-squared:  0.1157, Adjusted R-squared:  0.1124 
## F-statistic: 35.56 on 7 and 1903 DF,  p-value: < 2.2e-16

So I think we’re in good shape here. The three way interaction is non-significant, suggesting that our previous model’s (b8) interaction is sound. Let’s examine the effects further.

sex.labs <- c("Males", "Females")
names(sex.labs) <- c(1,2)
graph <- ggplot(b8, aes(age_c, clinroot,color=limbaugh))+stat_smooth(method=lm) + 
      ggtitle("Age and Limbaugh Listening as Factors \n in Hillary Clinton Feeling Thermometer") +
      scale_x_continuous(name = "Age (Centered)") +
      scale_y_continuous(name = "Clinton Feeling Thermometer") +
         theme(axis.line.x = element_line(linewidth=.5, colour = "black"),
            axis.line.y = element_line(linewidth=.5, colour = "black"),
            axis.text.x=element_text(colour="black", size = 9),
            axis.text.y=element_text(colour="black", size = 9),
            panel.grid.major = element_line(colour = "#d3d3d3"),
            panel.grid.minor = element_blank(),
            panel.border = element_blank(), panel.background = element_blank(),
            plot.title = element_text(size = 14, family = "serif", face = "bold", hjust = .5),
            text=element_text(family="serif")) 

graph <- graph + scale_colour_discrete(name="Limbaugh\nListener",
      breaks=c("0", "1"),
                       labels=c("No", "Yes"))

graph <- graph + facet_grid(. ~ sex, labeller = labeller(sex = sex.labs))

graph
## `geom_smooth()` using formula = 'y ~ x'

This graph suggests that Limbaugh listeners who are male decrease in their liking for Hillary Clinton as their age increases, but only slightly. However, females who are Limbaugh listeners decrease in their liking for Hillary Clinton pretty dramatically as their age increases. Very interesting!