HTML-slides
http://coherence-based-reasoning-and-rationality.de/amsterdam/multilevelReg.html
PDF-print
http://coherence-based-reasoning-and-rationality.de/amsterdam/slides.pdf
July 2016
HTML-slides
http://coherence-based-reasoning-and-rationality.de/amsterdam/multilevelReg.html
PDF-print
http://coherence-based-reasoning-and-rationality.de/amsterdam/slides.pdf
Luke. (2004). Multilevel Modeling. (p. 5)
Moerbeek & Teerenstra. (2016). Power analysis of trials with multilevel data. (p. 7)
Examples:
Multilevel models…
\[Y_{i} = \beta_{0} + e_{i}\]
\[Y_{i} = \beta_{0} + e_{i} \rightarrow Y_{i} = \beta_{0} + 0 \times X_{i} + e_{i}\]
\[Y_{i} = \beta_{0} + e_{i} \rightarrow Y_{i} = \beta_{0} + 0 \times X_{i} + e_{i}\]
\[e \sim N(0,\sigma)\]
\[Y_{i} = \beta_{0} + e_{i}\]
\[Y_{i} = \beta_{0j} + e_{i}\]
\[Y_{ij} = {\color{DarkMagenta}\beta_{0j}} + e_{i} = {\color{DarkMagenta}\beta_{00} + u_{0j}} + e_{ij}\]
\[Y_{ij} = \beta_{00} + {\color{red} u_{0j}} + e_{ij}\]
\[Y_{ij} = \beta_{00} + {\color{red} u_{0j}} + {\color{blue} e_{ij}}\]
\[e \sim N(0,\sigma)\]
\[ Y_{ij} = \beta_{00} + {\color{red}u_{0j}} + e_{ij}\]
Distribution of residuals:
\[e \sim N(0,\sigma_e)\]
\[ {\color{red}u_0} \sim N(0,\sigma_0)\]
Covariation of residuals:
\[cov(e_{ij},u_{j}) = 0\]
\[\rho = \frac{{\color{blue}\sigma_u^2}}{{\color{red}\sigma_u^2 + \sigma_e^2}}\]
The perentage of variance of the criterion explained by the groups is the estimated variance of random intercepts divided by the total variance.
\(\rho\) can vary between 0 and 1.
Fixed effects model (i.e., dummy coding and continuous variable(s)):
If we have nested data, why don't we use multilevel models?
LEMMA: University of Bristol; http://www.bristol.ac.uk/cmm/learning/online-course/index.html
Total variation left after accounting for fixed effect
\[ \text{resid}_{ij} = {\color{red} u_{0j}} + {\color{blue} e_{ij}}\]
\[ \sigma_{ \text{resid}_{ij}} = { \color{red} \sigma_{u_{0j}}} + {\color{blue} \sigma_{e_{ij}}}\]
The mean residual is the difference between the the fixed intercept and the mean of group j:
\[{\color{Black}\bar{r}_{0j}} = {\color{red} \bar{y_j}} - {\color{blue} \beta_0}\]
The estimated residual \(u_j\) for group \(j\) is a weighted mean residual:
\[u_{0j} = {\color{green} k} \times \bar{r}_{0j}\]
Shrinkage factor:
\[ {\color{green}k} = \frac{\sigma^2_u}{\sigma^2_u+({\color{DarkOrchid}\sigma^2_e} / {\color{orange}n_j})} \]
Consequence: Few cases with high error variance lead to high shrinkage (i.e., estimates regressing to the fixed intercept).
Calculating individual residuals is easy:
\[e_{ij} = \hat{y}_{ij} - (\beta_0 + u_{0j})\]
Questions
It is basically a t-test with corrected degrees of freadom:
\[t = \frac{b_i}{SE(b_i)}\]
coef(summary(modMLR))
## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 5.019234 13.71936 2.000642 0.3658503 0.7495396
Estimate of the effect is defined as:
\[ \hat{u_j} = (\bar{y_j}-\beta_0) \times \frac{\sigma^2_u}{\sigma^2_u+(\sigma^2_e /n_j)} \]
Variance of the estimate is defined as:
\[var(\hat{u_j} - u_j) = \frac{\color{blue}\sigma^2_u \color{red}\sigma^2_e}{\color{orange}n_j\color{blue}\sigma^2_u+\color{red}\sigma^2_e}\]
95% Confidence intervals are defined as:
\[\hat{u_j} \pm 1.96 \times \sqrt{var(\hat{u_j} - u_j)}\]
dotplot(ranef(modMLR, condVar= T))
## $group
\[T_{LR} = 2 \times log\left(\frac{{\color{blue}L_{g}}}{{\color{red}L_{n}}}\right) = 2 \times (log(L_g) - log(L_n))\]
The likelihood of the general model is compared to the likelihood of the nested model. The test statistic \(T_{LR}\) ist (asymptotically) \(\chi^2\) distributed with degrees of freedom \(df = n_{p_g} - n_{p_{n}}\).
What is the addtional parameter in the more general model?
General model:
\[Y_{ij} = \beta_{00} + u_{0j} + e_{ij}\]
Nested model:
\[Y_{i} = \beta_{0} + e_{i}\]
anova(modMLR,model)
## Data: NULL ## Models: ## ..1: Yij ~ 1 ## object: Yij ~ 1 + (1 | group) ## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) ## ..1 2 799.49 804.49 -397.74 795.49 ## object 3 580.21 587.71 -287.10 574.21 221.28 1 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In our example, \(T_{LR} = 2 \times (-287.1 + 397.74) = 221.28\) and \(df = 3 - 2 = 1\).
Likelihood Profile
\[L_1(\theta) = max L(\theta,\hat{\delta})\]
\[2\times (log(L(\hat{\theta},\hat{\delta}))-log(L_1(\theta))) < \chi^2_{1-\alpha}(1)\]
confint(modMLR,level=.95)
## 2.5 % 97.5 % ## .sig01 10.256922 57.290125 ## .sigma 4.616839 6.219009 ## (Intercept) -26.254657 36.293093
\[ Y_{ij} = \beta_{00} + u_{0j} + \beta_{1} \times X_{ij} + e_{ij}\]
\[ Y_{ij} = (\beta_{00} + u_{0j}) + (\beta_{01} + u_{1j}) \times X_{ij} + e_{ij}\]
\[Y_{ij} = (\beta_{00} + u_{0j}) + (\beta_{01} + u_{1j}) \times X_{ij} + e_{ij}\]
Distribution of residuals:
\[e \sim N(0,\sigma_e)\]
\[u \sim MVN(0,\Sigma)\]
\[\Sigma \sim \left(\begin{array}{ccc} \sigma^2_{u_{0j}} & - \\ \color{red}{\sigma_{01}} & \sigma^2_{u_{1j}} \end{array} \right) \]
Random effects can covariate.
Covariation of residuals:
\[cov(e_{ij},\Sigma) = 0\]
You can find the code here.
## $subject
Variance due to random effects
\[ var(u_{0j} + u_{1j}x_{ij}) = var(u_{0j}) + 2 \times x_{ij} \times cov(u_{0j},u_{1j}) + x_{ij}^2 \times var(u_{1j}) \]
\[ var(u_{0j} + u_{1j}x_{ij}) = \sigma^2_{u_0} + 2\sigma_{u01} x_{ij} + x^2_{ij} \sigma^2_{u_1} \]
Variance Partitioning Coefficient
\[\frac{{\color{green}\sigma^2_{u_0} + 2\sigma_{u01} x_{ij} + x^2_{ij} \sigma^2_{u_1}}}{{\color{green}\sigma^2_{u_0} + 2\sigma_{u01} x_{ij} + x^2_{ij} \sigma^2_{u_1}} + {\color{red}\sigma^2_e}} \]
Conclusion: The importance of context is dependent on the size of the predictor (i.e., heteroscedasticity is ok!)
## $subject
confint(modMLRFile,condVar=T)
## 2.5 % 97.5 % ## .sig01 1.1754879 2.9133000 ## .sig02 0.7341677 0.9851946 ## .sig03 0.7432186 1.8898389 ## .sigma 1.0153326 1.1981148 ## (Intercept) 17.8491616 20.2427882 ## pred 8.2388327 9.7990099
residuals <- resid(modMLRFile) summary(residuals)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -3.37400 -0.77870 0.03502 0.00000 0.73990 2.95400
hist(residuals)
residuals <- resid(modMLRFile) qqnorm(residuals) qqline(residuals)
qqnorm(ranef(modMLRFile)$subject$`(Intercept)`, main="Q-Q plot for the random intercept") qqline(ranef(modMLRFile)$subject$`(Intercept)`)
qqnorm(ranef(modMLRFile)$subject$`pred`, main="Q-Q plot for the random slope") qqline(ranef(modMLRFile)$subject$`pred`)
residuals <- resid(modMLRFile) xyplot(resid(modMLRFile) ~ fitted(modMLRFile), pch = 16)
Uncorrelated residuals with equal variances
\[\left(\begin{array}{ccc} \sigma^2_{e} & - & - \\ 0 & \sigma^2_{e} &- \\ 0 & 0 & \sigma^2_{e}\\ \end{array} \right) \]
Most complex error-model
\[\left(\begin{array}{ccc} \sigma^2_{e_{11}} & - & - \\ \sigma_{e_{21}} & \sigma^2_{e_{22}} &- \\ \sigma_{e_{31}} & \sigma_{e_{23}} & \sigma^2_{e_{33}}\\ \end{array} \right) \]
Autoregressive residuals
\[\sigma_e^2 \times \left(\begin{array}{ccc} 1 & \rho & \rho^2 \\ \rho & 1 & \rho \\ \rho^2 & \rho & 1\\ \end{array} \right) \]
A-priori power analysis
\(\alpha\)-Level
\(\beta\)-Level (resp. power \(1-\beta\))
effect size
n = ???
Example
Scottish School Leavers Survey (SSLS)
mydata <<- read.table(file = "http://www.coherence-based-reasoning-and-rationality.de/amsterdam/5.1.txt", sep = ",", header = TRUE) head(mydata)
## caseid schoolid score cohort90 female sclass schtype schurban schdenom ## 1 18 1 0 -6 1 2 0 1 0 ## 2 17 1 10 -6 1 2 0 1 0 ## 3 19 1 0 -6 1 4 0 1 0 ## 4 20 1 40 -6 1 3 0 1 0 ## 5 21 1 42 -6 1 2 0 1 0 ## 6 13 1 4 -6 1 2 0 1 0
library("lmerTest") library("lattice") library("arm")
nullModel <<- lmer(score ~ 1 + (1|schoolid), data = mydata, REML = FALSE) nullModel
## Linear mixed model fit by maximum likelihood ['merModLmerTest'] ## Formula: score ~ 1 + (1 | schoolid) ## Data: mydata ## AIC BIC logLik deviance df.resid ## 286545.1 286570.4 -143269.5 286539.1 33985 ## Random effects: ## Groups Name Std.Dev. ## schoolid (Intercept) 7.812 ## Residual 16.073 ## Number of obs: 33988, groups: schoolid, 508 ## Fixed Effects: ## (Intercept) ## 30.6
summary(nullModel)
## Linear mixed model fit by maximum likelihood t-tests use Satterthwaite ## approximations to degrees of freedom [lmerMod] ## Formula: score ~ 1 + (1 | schoolid) ## Data: mydata ## ## AIC BIC logLik deviance df.resid ## 286545.1 286570.4 -143269.5 286539.1 33985 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.9763 -0.7010 0.1017 0.7391 3.0817 ## ## Random effects: ## Groups Name Variance Std.Dev. ## schoolid (Intercept) 61.02 7.812 ## Residual 258.36 16.073 ## Number of obs: 33988, groups: schoolid, 508 ## ## Fixed effects: ## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 30.6006 0.3694 451.5000 82.83 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fixef(nullModel)
## (Intercept) ## 30.6006
head(unlist(ranef(nullModel)))
## schoolid.(Intercept)1 schoolid.(Intercept)2 schoolid.(Intercept)3 ## -11.841271 3.206333 3.396003 ## schoolid.(Intercept)4 schoolid.(Intercept)5 schoolid.(Intercept)6 ## -7.415008 3.426227 12.433731
predscore <<- fitted(nullModel) head(predscore)
## 1 2 3 4 5 6 ## 18.75932 18.75932 18.75932 18.75932 18.75932 18.75932
datapred <<- unique(data.frame(cbind(predscore = predscore, cohort90 = mydata$cohort90, schoolid = mydata$schoolid))) head(datapred)
## predscore cohort90 schoolid ## 1 18.75932 -6 1 ## 11 18.75932 -2 1 ## 12 18.75932 -4 1 ## 42 33.80693 -6 2 ## 46 33.80693 -4 2 ## 52 33.80693 6 2
xyplot(predscore ~ cohort90, data = datapred, groups = schoolid, type = c("p","l"), col = "red")
varcovRandom = data.frame(VarCorr(nullModel)) varcovRandom
## grp var1 var2 vcov sdcor ## 1 schoolid (Intercept) <NA> 61.02413 7.811794 ## 2 Residual <NA> <NA> 258.35725 16.073495
varcovRandom[1,4]/(varcovRandom[1,4]+varcovRandom[2,4])
## [1] 0.1910698
simpleRegression = lm(score ~ 1, data = mydata) testStat = 2 * (logLik(nullModel) - logLik(simpleRegression)) testStat
## 'log Lik.' 3749.777 (df=3)
p = 1-pchisq(testStat, 1) p
## 'log Lik.' 0 (df=3)
dotplot(ranef(nullModel,condVar=TRUE))
## $schoolid
head((se.ranef(nullModel))$schoolid)
## (Intercept) ## 1 2.389898 ## 2 1.302732 ## 3 1.497341 ## 4 2.071050 ## 5 1.630054 ## 6 1.403097
model <<- (lmer(score~cohort90+(1|schoolid),data=mydata,REML=FALSE)) summary(model)
## Linear mixed model fit by maximum likelihood t-tests use Satterthwaite ## approximations to degrees of freedom [lmerMod] ## Formula: score ~ cohort90 + (1 | schoolid) ## Data: mydata ## ## AIC BIC logLik deviance df.resid ## 280921.6 280955.3 -140456.8 280913.6 33984 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.1487 -0.7242 0.0363 0.7339 3.7097 ## ## Random effects: ## Groups Name Variance Std.Dev. ## schoolid (Intercept) 45.99 6.781 ## Residual 219.29 14.808 ## Number of obs: 33988, groups: schoolid, 508 ## ## Fixed effects: ## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 3.056e+01 3.225e-01 4.330e+02 94.74 <2e-16 *** ## cohort90 1.215e+00 1.553e-02 3.392e+04 78.24 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Correlation of Fixed Effects: ## (Intr) ## cohort90 -0.002
anova(model,nullModel)
## Data: mydata ## Models: ## ..1: score ~ 1 + (1 | schoolid) ## object: score ~ cohort90 + (1 | schoolid) ## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) ## ..1 3 286545 286570 -143270 286539 ## object 4 280922 280955 -140457 280914 5625.5 1 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predscore <<- fitted(model) head(predscore)
## 1 2 3 4 5 6 ## 16.54111 16.54111 16.54111 16.54111 16.54111 16.54111
datapred <<- unique(data.frame(cbind(predscore = predscore, cohort90 = mydata$cohort90, schoolid = mydata$schoolid))) head(datapred)
## predscore cohort90 schoolid ## 1 16.54111 -6 1 ## 11 21.40093 -2 1 ## 12 18.97102 -4 1 ## 42 26.05932 -6 2 ## 46 28.48923 -4 2 ## 52 40.63878 6 2
xyplot(predscore ~ cohort90, data = datapred, groups = schoolid, type = c("p","l"), col = "red")
anova(model,nullModel)
## Data: mydata ## Models: ## ..1: score ~ 1 + (1 | schoolid) ## object: score ~ cohort90 + (1 | schoolid) ## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) ## ..1 3 286545 286570 -143270 286539 ## object 4 280922 280955 -140457 280914 5625.5 1 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model2 <<- (lmer(score~cohort90+(1+cohort90|schoolid),data=mydata,REML=FALSE)) summary(model2)
## Linear mixed model fit by maximum likelihood t-tests use Satterthwaite ## approximations to degrees of freedom [lmerMod] ## Formula: score ~ cohort90 + (1 + cohort90 | schoolid) ## Data: mydata ## ## AIC BIC logLik deviance df.resid ## 280698.2 280748.8 -140343.1 280686.2 33982 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.1008 -0.7202 0.0387 0.7264 3.5220 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## schoolid (Intercept) 42.8581 6.5466 ## cohort90 0.1606 0.4007 -0.39 ## Residual 215.7393 14.6881 ## Number of obs: 33988, groups: schoolid, 508 ## ## Fixed effects: ## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 30.60963 0.31345 426.80000 97.66 <2e-16 *** ## cohort90 1.23390 0.02531 316.50000 48.74 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Correlation of Fixed Effects: ## (Intr) ## cohort90 -0.266
predscore <- fitted(model2) datapred <- unique(data.frame(cbind(predscore = predscore, cohort90 = mydata$cohort90, schoolid = mydata$schoolid))) head(datapred)
## predscore cohort90 schoolid ## 1 16.11346 -6 1 ## 11 21.84115 -2 1 ## 12 18.97730 -4 1 ## 42 25.59768 -6 2 ## 46 28.16286 -4 2 ## 52 40.98872 6 2
xyplot(predscore ~ cohort90, data = datapred, groups = schoolid, type = c("p","l"), col = "red")
anova(model,model2)
## Data: mydata ## Models: ## object: score ~ cohort90 + (1 | schoolid) ## ..1: score ~ cohort90 + (1 + cohort90 | schoolid) ## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) ## object 4 280922 280955 -140457 280914 ## ..1 6 280698 280749 -140343 280686 227.4 2 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(ranef(model2)$schoolid[,1],ranef(model2)$schoolid[,2], xlab = "Random Intercept", ylab = "Random Slope", pch = 16)
confint(model2)
## 2.5 % 97.5 % ## .sig01 6.0766116 7.0644671 ## .sig02 -0.5158609 -0.2499036 ## .sig03 0.3501972 0.4556607 ## .sigma 14.5766568 14.8008999 ## (Intercept) 29.9938531 31.2259813 ## cohort90 1.1841324 1.2838732
\[\frac{{\color{green}\sigma^2_{u_0} + 2\sigma_{u01} x_{ij} + x^2_{ij} \sigma^2_{u_1}}}{{\color{green}\sigma^2_{u_0} + 2\sigma_{u01} x_{ij} + x^2_{ij} \sigma^2_{u_1}} + {\color{red}\sigma^2_e}} \]
varCo <<- data.frame(VarCorr(model2)) varCo
## grp var1 var2 vcov sdcor ## 1 schoolid (Intercept) <NA> 42.8581264 6.5466118 ## 2 schoolid cohort90 <NA> 0.1605915 0.4007387 ## 3 schoolid (Intercept) cohort90 -1.0241807 -0.3903901 ## 4 Residual <NA> <NA> 215.7393041 14.6880667
x = -6 (varCo[1,4] + 2*varCo[3,4] * x + varCo[2,4] * x^2)/ (varCo[1,4] + 2*varCo[3,4] * x + varCo[2,4] * x^2 + varCo[4,4])
## [1] 0.2202257
plotVPC = function(x){(varCo[1,4] + 2*varCo[3,4] * x + varCo[2,4] * x^2)/ (varCo[1,4] + 2*varCo[3,4] * x + varCo[2,4] * x^2 + varCo[4,4])} curve(plotVPC,-6,6,ylab="VPC",xlab="cohort")
Find the article here.
Find the pdf file for the exercise here.
Group # | Dummy 1 | Dummy2 | Group Variable |
---|---|---|---|
1 | 1 | 0 | 5 |
2 | 0 | 1 | 10 |
3 | 0 | 0 | 6 |
\[ 6 + dummy_{1j} \times -1 + dummy_{2j} \times 4 = v_{group_j}\]