Mixed effects logistic regression is used to model outcome binary variables, where the logarithmic probabilities of the outcomes are modeled as a linear combination of the predictor variables when data are pooled or when there are both fixed and random effects.
This site uses the following packages. Make sure you can load them before trying to run the examples on this page. If you don't have a package installed, run:install.packages("Paketname")
, or if you notice that the version is not up to date, run:update.packages()
.
demand(ggplot2)demand(Gally)demand(reforma2)demand(lme4)demand(Compiler)demand(parallel)demand(throw)demand(network)
Versions information:The code on this page was tested on R version 3.1.0 (2014-04-10) On: 2014-07-10 With: Boot 1.3-11; film4 1.1-6; PPR 0.11.2; matrix 1.1-3; GGally 0.4.4; reformat 0.8.4; layers 1.8; xtable 1.7-3; car 2.0-20; foreigners 0.8-61; Hmish 3.14-4; Formula 1.1-1; survival 2.37-7; grid 0.20-29; mgcv 1.7-29; number 3.1-117; png 0.1-7; additional grid 0.9.1; Reform2 1.2.2; ggplot2 0.9.3.1; VCD1.3-1; rjson 0.2.14; RJLite 0.11.4; DBI 0.2-7; Weaving 1.5
Have in mind:The purpose of this page is to show you how to use different data analysis commands. It does not cover all aspects of the research process that are expected of researchers. In particular, it does not include data cleaning and verification, assumption checking, model diagnostics, or possible follow-up analysis.
Examples of mixed effects logistic regression
Example 1:A researcher analyzed applications from 40 different colleges to examine the factor that predicts college admissions. Predictors include a student's high school average, extracurricular activities, and SAT scores. Some schools are more or less selective, so the probability of initial admission to each school is different. School level predictors include whether the school is public or private, current student-teacher ratio, and school ranking.
Example 2:As part of a larger study of treatment outcomes and quality of life in patients with lung cancer, a large HMO wants to know which patient and physician factors are most associated with a patient's lung cancer remission after treatment.
Example 3:A TV station wants to know how weather and advertising campaigns affect whether people watch a TV show. They sample people from four cities over a six-month period. Each month, they ask whether or not people watched a certain show last week. After three months, they launched a new advertising campaign in two of the four cities and continued to monitor whether or not viewers had watched the show.
data description
In this example, we look at example 2 on lung cancer using a simulated dataset that we put online. A variety of results were collected on patients matched with physicians matched with hospitals. There are also some medical-level variables such as:experience
which we will use in our example.
hdp <- ler.csv("https://stats.idre.ucla.edu/stat/data/hdp.csv")hdp <- inside(PS, { Married <- Factor(Married, levels = 0:1, hang tags = C("not", "e")) THAT <- Factor(THAT) ESCONDIDO <- Factor(ESCONDIDO)Krebsstadium <- Factor (Krebsstadium)})
Now let's plot our continuous predictor variables. Visualizing data can help us understand distributions, catch coding errors (for example, we know that a variable only accepts values from 0 to 7, but we see 999 on the graph), and give us a sense of the relationship between our variables. For example, we might see that two predictors are highly correlated and decide that we only want to include one in the model, or we might find a curvilinear relationship between two variables. Data visualization is a quick and intuitive way to check it all out at once. If most of your predictors appear to be independent, that's fine. It shapes your expectations of the model. For example, if they're independent, a predictor's estimate shouldn't change significantly when you enter another predictor (although standard error and significance tests might). We can get all this information and insights into what and how to model data simply through visualization.
couples(HDP[, C("IL6", "PCR", "Length of stay", "Experience")])
There don't seem to be strong linear relationships between our continuous predictors. Let's see the distributions of our variablescancer stage
.Whylength of stay
discreetly encoded in days, we can examine howcancer stage
is assigned to it by bubble charts. The area of each bubble is proportional to the number of observations with these values. For continuous predictors, we use fiddle plots with fluctuating data values. All raw data is shown separated bycancer stage
🇧🇷 To mitigate the overtracing and see the values better, let's add a small amount of random noise (mostly on the x-axis) and set the alpha transparency. Whilst choppy points are useful for visualizing raw data, they can be difficult to obtain accurately. distribution direction. To this we add violin weaves. The fiddle graphs are just kernel density graphs, mirrored around the axis of the graph. We draw the violin plots over the floating points with transparency so you can still see the raw data, but the violin plots dominate. because bothIL6
jPCR
tend to have skewed distributions, we use a square root scale on the y-axis. The distributions look pretty normal and symmetric, although you can still see the long right tail even using a square root scale (note that only the scale has changed, the values themselves are not transformed, which is important because this is possible to consider and interpret the actual results and not the square root of the results).
ggplot(HDP, aes(x = cancer stage, j = residence time)) + stat_sum(aes(Talla = ..n.., group = 1)) + scale_size_area(maximum size=10)
temperature <- sessions(HDP[, C("cancer stage", "IL6", "PCR")], id.vars="cancer stage")ggplot(tmp, aes(x = cancer stage, j = bravura)) + geom_jitter(Alfa = .1) + geom_violin(Alfa = 0,75) + facet_grid(Variable ~ .) + scale_y_sqrt()
Since it is difficult to see how binary variables change levels of the continuous variable, we can flip the problem around and look at the distribution of the continuous variable at each level of the binary result.
temperature <- sessions(HDP[, C("Remission", "IL6", "PCR", "Length of stay", "Experience")], id.vars="Remission")ggplot(tmp, aes(Factor(Remission), j = bravura, complete=Factor(Remission))) + geom_boxplot() + facet_wrap(~Variable, scale="free_y")
Analysis methods you can consider
Below is a list of analysis methods you may have considered.
- Mixed effects logistic regression, focus of this page.
- Mixed effects probit regression is very similar to mixed effects logistic regression, but uses normal CDF instead of logistic CDF. Both model binary outcomes and can include random and fixed effects.
- Fixed effects logistic regression is limited in this case as it may ignore necessary random effects and/or lack of independence in the data.
- Fixed effects probit regression is limited in this case as it may ignore necessary random effects and/or lack of independence in the data.
- Logistic regression with pooled standard errors. These can be customized to non-independence, but do not allow for random effects.
- Probit regression with pooled standard errors. These can be customized to non-independence, but do not allow for random effects.
Mixed effects logistic regression
Next we use theforget
Command to estimate a mixed effects logistic regression modelil6
,PCR
, jlength of stay
as continuous predictors at the patient level,cancer stage
as a categorical predictor at patient level (I, II, III, or IV),experience
as a continuous predictor of physician level and a random overlay byTHAT
, Identification of the physician.
Estimating and interpreting generalized linear mixed models (GLMMs, of which mixed effects logistic regression is one) can be quite difficult. If you're just starting out, we strongly recommend that you read this page first.Introduction to GLMMs🇧🇷 It covers some of the background and theories in more detail, as well as estimation options, conclusions, and pitfalls.
# Estimate the model and store the results in mSubway <- forget(Remission ~ IL6 + PCR + cancer stage + length of stay + experience + (1 | THAT), Data = HDP, family = binomial, to control = glmerControl(optimizer = "Grandfather"), nAGQ = 10)# Print mod results without correlations between fixed effectspress(Subway, Spoon = INCORRECT)
## Maximum Likelihood Generalized Linear Mixed Model Fit (Adaptive## Gauss-Hermite Quadrature, nAGQ = 10) [glmerMod]## Family: binomial ( logit )## Formula: ## Remission ~ IL6 + CRP + Stage of Cancer + Stay Length + Experience + ## (1 | DID)## Date: hdp## AIC BIC logLik deviance df.resid ## 7397 7461 -3690 7379 8516 ## Random Effects:## Group Name Std. Dev.## DID ( Intercept ) 2.01 ## Number of Observations: 8525, Groups: DID, 407## Fixed Effects:## (Intercept) IL6 CRP CancerStageII ## -2.0527 -0.0568 -0.0215 -0.4139 ## CancerStageIII CancerStageIV Duration of permanence experience ## - 1.0035 -2.3370 - 0.1212 0.1201
The first part tells us that the estimates are based on an adaptive Gaussian Hermite probability approximation. In particular, we use 10 integration points. As we use more integration points, the approximation becomes more accurate as it converges with the ML estimates; However, more dots are more computationally intensive and can be extremely slow or even difficult to handle with current technology. To avoid a non-convergence warning, we specify a different optimizer with the argumentcontrol=glmerControl(optimizador="bobyqa")
🇧🇷 Although the model without the new argument gives nearly identical results, we prefer to use models without such restrictions.
The next section gives us background information that can be used to compare models, followed by random effects estimates. This represents the estimated variability of the intercept on the logit scale. If there were other random effects, such as random slopes, they would show up here as well. The top section concludes with the total number of observations and the number of level 2 observations. In our case, it includes the total number of patients (8,525) and physicians (407).
The last section is a table of fixed effect estimates. For many applications, this is what people are most interested in. The estimates represent the regression coefficients, are not standardized and are on the logit scale. Estimates are accompanied by their standard errors (SE). As usual with GLMs, SEs are obtained by inverting the observed information matrix (negative second derivative matrix). For GLMMs, however, this is again an approximation. Approximations for coefficient estimates are likely to stabilize more quickly than those for EEs. Therefore, if you use fewer integration points, the estimates may be reasonable, but the EE approximation may be less accurate. The Wald tests (frac{Estimate}{SE}) are based on the asymptotic theory, which here refers to the fact that the size of the highest level entity converges to infinity, these tests are normally distributed and from them p -values (the probability of getting the observed or most extreme estimate, provided the true estimate is 0).
It might be good to get confidence intervals (CI). We can get rough estimates from the US.
swindler <- square(To diagnose(Uds(Subway)))# Table of estimates with 95% CI(eyelash <- shortcut(Es = arrangement(Subway), LL = arrangement(Subway) - 1,96 * swindler UL = arrangement(Subway) + 1,96 * swindler))
## Est ll ## (Intercept) -2.05269 -3.09435 -1.011025 ## IL6 -0.05677 -0.07935 -034196 ## CRP -0.02148 -0.04151 -0.00145 # # CANTERSTAGEII -0.41393 -0.81082 ## Duration of Stay1 -25.10 . ## Experience 0.12009 0.06628 0.173895
If we wanted odds ratios instead of coefficients on the logit scale, we could exponentiate the estimates and AIs.
one acre(eyelash)
## Est LL UL## (Intercept) 0.12839 0.04530 0.3638## IL6 0.94481 0.92372 0.9664## CRP 0.97875 0.95934 0.9985## CancerStageII 0.66104 0.56982 0.7669## CancerStageIII 0.36661 0.30237 0.4445## CancerStageIV 0.09661 0.07088 0.1317## LengthofStay 0.88587 0.82936 0.9462# # Erfahrung 1.12760 1.06853 1.1899
multilevel startup
Completing the GLMM is difficult. Except in cases where there are many observations at each level (especially the highest), the assumption that (frac{estimate}{SE}) is normal may not be accurate. bootstrap Each of these methods can be complex to implement. Let's focus on a small startup example.
Bootstrapping is a resampling method. It's by no means perfect, but it's conceptually simple and easy to implement in code. One downside is that it is computationally intensive. For large datasets or complex models, where each model takes a few minutes to run, estimating thousands of bootstrap samples can easily take hours or days. In the example on this page we used a very small number of samples, but in practice I would use many more. Perhaps 1,000 is a reasonable starting point.
For one-level models, we can implement simple random sampling with initialization substitution. For multi-level data, we want to resample in the same way as the data generation engine. We'll start resampling from the highest level and then move down one level at a time. In our case, we collected samples first from physicians and then samples from patients of each sampled physician. To do this, we first need to write a function that resamples at each level.
The Vanderbilt Department of Biostatistics has a good page describing the idea:Non-parametric bootstrap applied with hierarchical and correlated data
sampler <- occupation(What, Clustervar, substitute = SAFE, repetitions = 1) { acid <- not married(dat[, Clustervariable[1]]) not <- Longo(kid) recurrence <- Show(Cid, Talla = ncida * Representative, substitute = SAFE) e (substitute) { launch <- lapear(seq_long(recurrence), occupation(EU) { shortcut(novo ID = of, Tab-ID = Show(what(dat[, cluster variable] == relapse[s]), Talla = Longo(what(dat[, cluster variable] == recidi[i])), substitute = SAFE)) }) } Most { launch <- lapear(seq_long(recurrence), occupation(EU) { shortcut(novo ID = of, Tab-ID = what(dat[, cluster variable] == recidi[i])) }) } What <- as.data.frame(turn on(bind, undo)) Whatp.sreplicate <- Factor(cut(Whatp.snovo ID, breaks = C(1, ncid * 1:repetitions), include.minor = SAFE, hang tags = INCORRECT)) Whatp.snovo ID <- Factor(Whatp.snew ID card) give back(What)}
Now let's resample our data and do 100 iterations. Again, it would probably take thousands in practice. We plant the seeds so that our results are reproducible. It's also likely that you'll end up sampling more replicates than you want, as many samples may not converge and you won't get estimates from them.
define.seed(20)temperature <- sampler(HDP, "THAT", repetitions = 100)big data <- shortcut(tmp, hdp[tmpp.sTab-ID, ])
We then fit the model to the newly sampled data. We first save our original model estimates, which we will use as seeds for the bootstrap models. Next, we create a local cluster with 4 nodes (the number of processors on our computer is defined as the number of processors you have on yours). Then we export the data and load thelme4
package in the cluster. Finally, we write a function that fits the model and returns the estimates. the call toforget()
is wrappedtreat
since not all models can converge on the resampled data. This will catch the error and return it instead of stopping processing.
F <- arrangement(Subway)r <- do not go(Subway, "Aunt")Kl <- Hacercluster(4)clusterExportar(kl, C("Big Data", "F", "r"))clusterEvalQ(kl, demand(lme4))
## [[1]]## [1] TRUE## ## [[2]]## [1] TRUE## ## [[3]]## [1] TRUE## ## [[4 ]]## [1] TRUE
my boot <- occupation(EU) { Object <- treat(forget(Remission ~ IL6 + PCR + cancer stage + length of stay + experience + (1 | new identity card), Data = big data, subset = replicate == EU, family = binomial, nAGQ = 1, start = List(arrangement = f, aunt = r)), silently = SAFE) e (Class(Object) == "Try-Error") give back(Object) C(arrangement(Object), do not go(Object, "Aunt"))}
Now that we have the data, local cluster, and tuning function settings, we are ready to initialize. For this we use theparLapplyLB
Function that goes through each replication and delivers it to each cluster node to estimate the models. The "LB" stands for Load Balancing, which means that replicas are distributed when a node completes its current task. This is valuable because not all replicas will converge, and if a failure occurs and it happens sooner, one node may be up and running faster than another node. There is additional communication overhead, but it is small compared to the time required. it takes to fit each model. Results from all nodes are added back into a single list which is stored in the objectresolution
🇧🇷 Once that's done, we can shut down the local cluster, which will kill the extraR
Instantiate and free memory.
start <- Processing time()resolution <- parLapplyLB(kl, x = levels(big datap.sreplicate), Fun = mi bota)Final <- Processing time()# close clusterparadaCluster(cl)
Now that we have the bootstrap results, we can summarize them. First, we calculate the number of successfully converged models. We do this by checking whether a given result is numeric or not. Errors are not numeric and therefore are ignored. We can average the successes to see the proportion of iterations that converged and for which we have results.
# Calculate the proportion of models that successfully convergedsuccess <- apply juice(res, is.numeric)mine(Success)
## [1] 1
We then convert the list of bootstrap results into an array and then compute the 2.5. and 97.5. percentile for each parameter. Finally, we can create a table with the results, including the original estimates and standard errors, the average bootstrap estimate (which is asymptotically equivalent to the original results, but, as in our case, can be biased by a small number of iterations), and the bootstrap estimate. confidence intervals. You can also use this data to calculate bootstrap biased confidence intervals if needed, although we only show percentile CIs.
# Combine successful resultsgrosso <- turn on(cbind, res[success])# Calculate the 2.5. and 97.5. Percentile for the 95% CI(ci <- t(investigation(rude, 1, Quantil, problems = C(0,025, 0,975))))
## 2.5% 97.5% ## (section) -3.61982 -0.985404 ## IL6 -0.08812 -0.029664 ## CRP -0.04897 0.006824 ## Cancerstageii -0, 60754-0.228019##CESTAGEIII-1.1.1217. # Length of stay -0.21596 -0.046420## Experience 0.06819 0.207223## New ID.(Interception) 2.03868 2.476366
# all resultsclosing table <- shortcut(Es = C(f, r), SE = C(What if A), BootMean = LinhaMeans(grosso), ci)# round and printredondo(closing table, 3)
## Est SE BootMean 2.5% 97.5%## (intercept) -2.053 0.531 -2.205 -3.620 -0.985## IL6 -0.057 0.012 -0.059 -0.088 -0.030## CRP -0.021 0.010 -0.022 -0.049 0.007StageII -# 0.414 0.076 -0.417 -0.608 -0.228 ## Cancer Stage III -1.003 0.098 -1.043 -1.302 -0.755 ## Cancer Stage IV -2.337 0.158 -2.460 -2.914 -2.003 ## Long Term Stay -0.121 0.2.0.0 1 -0.0. 0.1 -0.0.0.1 0.0.0.0.0.0.0.0.0.0.0.042.0.0.0.0.0.0.0.0.0.042.0.0.042.0.042.0.042.0.042.0.042.0.042.0.042.0.042.0.042.0 .042.0.0 . 🇧🇷 042.0.0. 042.0.0.042. 0.068 0.207## DID.(Interception) 2.015 N/A 2,263 2,039 2,476
Predicted odds and charts
These results are great to include in the table or in the text of a research manuscript; However, the numbers can be difficult to interpret. Visual presentations are useful for ease of interpretation and for posters and presentations. As models become more complex, there are many options. We'll briefly discuss some of them and give an example of how you can create one.
In a logistic model, the output is usually on one of three scales:
- Log odds (also called logits), this is the linearized scale
- Odds ratios (exponential log-odds) that are not on a linear scale
- Odds that also don't scale linearly.
At the tables, people often come up with odds ratios. The logit or probability scale is the most common for visualization. There are some pros and cons for each. The logit scale is convenient because it is linearized, meaning that a 1-unit increase in one predictor results in a 1-unit increase in the outcome coefficient, and this is true regardless of the levels of the other predictors (prior interactions except care). 🇧🇷 One disadvantage is that the scale is not very interpretable. It is difficult for readers to understand Logits intuitively. Rather, the odds are great for intuitively understanding the results; however, they are not linear. This means that a one-unit increase in predictor is not equal to a constant increase in probability; the change in probability depends on the values chosen for the other predictors. With ordinary logistic regression, you can simply hold all predictors constant and vary only the predictor of interest. However, in mixed-effects logistic models, random effects also influence the results. So if you hold everything constant, the change in outcome probability at different values of your predictor of interest will only be true if all covariates are held constant and you are either in the same group or in a group with the same effect at random. Effects depend on other predictors and group members, which are quite limited. An attractive alternative is to obtain the average marginal probability. That is, across all of our sample groups (which we hope are representative of your target group), plot theAverageChange in outcome probability across the range of a predictor of interest.
Let's look at an example using average marginal probabilities. This takes more work than conditional probabilities because you have to compute separate conditional probabilities for each group and then average them. It is also not easy to obtain confidence intervals around these mean marginal effects in a frequentist framework (although obtaining them from a Bayesian estimate is trivial).
First, let's define the general procedure using the notation ofon hereWe generate \(\mathbf{X}_{i}\) by taking \(\mathbf{X}\) and assigning a given predictor of interest, say in column \(j\), to a constant. If you imported only one predictor value, \(i \in {1}\). More often, however, we want a range of values for the predictor to represent how the predicted probability changes over its range. We can do this as follows: we take the observed range of the predictor and take \(k\) evenly spaced samples within the range. Suppose our predictor goes from 5 to 10 and we want 6 samples, \(\frac{10 - 5}{6 - 1} = 1\), then each sample would be 1 away from the previous one and would be: \ ({5, 6, 7, 8, 9, 10}\). Then we generate \(k\) different \(\mathbf{X}_{i}\)s with \(i\in{1,\ldots,k}\), where the \(j\) nth column is is defined as a constant. Then we calculate: $$\mathbf{\eta}_{i} = \mathbf{X}_{i}\mathbf{\beta } + \mathbf{Z}\mathbf{\gamma}$$These are all different linear predictors . Finally, we take \(h(\mathbf{\eta })\), which gives us \(\mathbf{\mu }_{i}\), which in our case are conditional expectations on the original scale, probabilities like this , we can take the expectation of each \(\boldsymbol{\mu}_{i}\) and plot it against the value at which our predictor of interest was held. We could also create boxplots to show not just the predicted average marginal probability, but also the predicted probability distribution.
You may have noticed that these estimates vary greatly. We use \(\mathbf{X}\) just keeping our predictor of interest constant, which allows all other predictors to take on values in the original data. Also, we leave \(\mathbf{Z}\mathbf{\gamma}\) as in our example, which means that some groups are more or less represented than others. If we wanted, we could have reweighted all groups to have the same weight. We decided to leave all these things as they are in this example based on the assumption that our sample is really a good representative of our population of interest. Rather than trying to choose meaningful values to preserve the covariates (even the mean is not necessarily significant, especially when a covariate is a bimodal distribution it can benotparticipant had a value equal to or close to the mean), we used the values from our sample. This also suggests that if our sample is a good representation of the population, then the mean marginal predicted probabilities are a good representation of the probability of a new random sample of our population.
Now that we have some background and theory, let's look at how we actually calculate these things. We get a summary oflength of stay
, our predictor of interest, and then get 100 values in its range to use in the prediction. We will make a copy of our data so that we can set the values of one of the predictors and then use thempredict
Function to calculate predicted values. All random effects are included by default, see?predict.merMod
for more details.Please note that the mixed effects model prediction method is new and currently only in the development version oflme4
, so make sure you have it installed.
# temporary datatmpdat <- HDP[, C("IL6", "PCR", "cancer stage", "Length of stay", "Experience", "THAT")]continue(HDPp.slength of stay)
##Mindest. 1.What. Median Median 3erQu. max ## 1.00 5.00 5.00 5.49 6.00 10.00
j values <- swindler(HDP, sequence(from to = Minimum(Length of stay), one = maximum(Length of stay), long time out = 100))# calculate predicted probabilities and store them in a listpages <- lapear(j values, occupation(j) { tmpdatp.slength of stay <- j predict(Subway, new data = tmpdat, he writes = "Answers")})
Now that we have all the probabilities predicted, we can work on displaying them. For example, we could look at the predicted average marginal probability over a handful of different lengths of stay. We can also spell them all.
# average predicted marginal probability over some different lengths of# Stayapply juice(Pages[C(1, 20, 40, 60, 80, 100)], mine)
## [1] 0,3652 0,3366 0,3075 0,2796 0,2530 0,2278
# Cover the middle of the lower and upper quartersplot data <- t(apply juice(Pages, occupation(x) { C(METRO = mine(X), quantil(X, C(0,25, 0,75)))}))# Add the length of the stay values and convert them to a dataframeplot data <- as.data.frame(shortcut(plotdat, jvalues))# best names and show first linescolumn names(structure data) <- C("predicted probability", "Decrease", "Superior", "Length of stay")Kopf(structure data)
## Prediction Probability Top Bottom Length of ### 1 0.3652 0.08490 0.6156 1.000 ## 2 0.3637 0.08405 0.6130 1.091 ## 3 0.3622 0.08320 0.6103 1.182 # # 4 0.3607
# Record the mean marginal predicted probabilitiesggplot(chart data, aes(x = residence time, j = predicted probability)) + geom_line() + Superior(C(0, 1))
We could also sum the lower and upper quartiles. This information shows us the area where 50% of the predicted probabilities fell.
ggplot(chart data, aes(x = residence time, j = predicted probability)) + geom_linerange(aes(ymin = inferior, ymax = superior)) + geom_line(Talla = 2) + Superior(C(0, 1))
This is just the beginning of what can be done. For packages, it makes sense to add more information. We could do the same slightly average predicted probabilities, but in addition to varyinglength of stay
We could do this for any level ofcancer stage
.
# calculate predicted probabilities and store them in a listbiprobos <- lapear(levels(HDPp.sstage of cancer), occupation(stage) { tmpdatp.scancer stage[] <- stage lapear(j values, occupation(j) { tmpdatp.slength of stay <- j predict(Subway, new data = tmpdat, he writes = "Answers") })})# Get means and quartiles for all j values for each CancerStage levelplotdat2 <- lapear(Biprobos, occupation(x) { Temperature <- t(apply juice(X, occupation(x) { C(METRO=mine(X), quantil(X, C(.25, 0,75))) })) Temperature <- as.data.frame(shortcut(temperature, j values)) column names(Temperature) <- C("predicted probability", "Decrease", "Superior", "Length of stay") give back(Temperature)})# collapse into a dataframeplotdat2 <- turn on(call, trace data2)# Add cancer stageplotdat2p.scancer stage <- Factor(repetitions(levels(HDPp.sstage of cancer), Worldwide = Longo(j values)))# show the first linesKopf(plotdat2)
Lower prediction probability Upper cancrestage length ## 1 0.4475 0.1547 0.7328 1.000 i ## 2 0.4458 0.1533 0.7307 1.091 i ## 3 0.4441 0.1519 0.7285 1.182 i##4 0.4425 0.15050 0.150 0.7285 1.182 i#4 0.4425 0.1505.
# draw thisggplot(plotdat2, aes(x = residence time, j = predicted probability)) + geom_band(aes(ymin = inferior, ymax = consider complete = stage of cancer), Alfa = .fifteen) + geom_line(aes(Cor = stage of cancer), Talla = 2) + Superior(C(0, 1)) + facet_wrap(~ cancer stage)
The outlook for a patient with stage IV lung cancer who spent 10 days in the hospital to develop the cancer into remission is pretty bleak (remember this is simulated data). It also looks like the distribution is skewed. We can only examine the predicted probability distribution for this group.
ggplot(data frame(problems = biproblem[[4]][[100]]), aes(Problems)) + geom_histogram() + scale_x_sqrt(breaks = C(0,01, 0,1, 0,25, 0,5, 0,75))
## stat_bin: default bin width is range/30. Use 'binwidth=x' to adjust this.
Even using a square root scale that distributes the lowest values, it is still extremely skewed. It is estimated that the vast majority have less than a 0.1 chance of being in remission.
Three-level mixed effects logistic regression
We analyzed in detail a two-level logistic model with random intersection. This is the simplest logistic model with mixed effects. Now let's see how you can add a third layer and random gradient effects and random intersections.
We then estimated a three-level logistic model with a random intersection for physicians and a random intersection for hospitals. In these examples are doctorsnestedwithin hospitals, which means that each doctor belongs to a single hospital. The alternative case is also sometimes referred to as "cross-classification", which means that a doctor may belong to several hospitals, for example if some of the doctor's patients are from hospital A and some are from hospital B. Noforget
You don't need to specify whether groups are nested or cross-sorted.R
You can find out from the data. we use the same(1 | Identification)
general syntax for specifying the intersection (1), which varies by ID. For models with more than one simple scalar random effectforget
only supports a single integration point, so we usenAGQ=1
.
# Estimate the model and store the results in mm3a <- forget(Remission ~ Years + length of stay + FamilieHx + IL6 + PCR + cancer stage + experience + (1 | THAT) + (1 | ESCONDIDO), Data = HDP, family = binomial, nAGQ=1)
## Warning: Failed to combine model with max|grad| converges = 74.1215 (Tol = 0.001)
# Print mod results without correlations between fixed effectspress(m3a, Spoon=INCORRECT)
## Generalized Maximum Likelihood Linear Mixed Model Adjustment (Laplace approximation##) [glmerMod]## Family: Binomial ( Logit )## Formula: ## Remission ~ Age + Length of Stay + FamilyHx + IL6 + CRP + CancerStage + # # Experience + (1 | DID) + (1 | HID)## Data: hdp## AIC BIC logLik Deviation df.resid ## 7199 7284 -3588 7175 8513 ## Random Effects:## Group Name Std Dev . # # DID (intercept) 1.952 ## HID (intercept) 0.549 ## number of observations: 8525, groups: DID, 407; HID, 35## Fixed Effects:## (Section) Age Length of Stay FamilyHxja## -1.6881 -0.0149 -0.0447 -1.3066## IL6 CRP CancerStageII CancerStageIII ## -0.0569 -0.0222 -0.3185 -0.8570 ##CancerStageIV Experience## -2.1375 0.1269
The output tells us the family (binomial for binary results) and the link (logit). Followed by the usual fit ratios and random effects variance. In this case, the variability of the intercept (on the logarithmic probability scale) between physicians and between hospitals. The standard deviation is also shown (only the square root of the variance, not the standard error of the estimated variance). We also get the number of unique units in each level. Last are the fixed effects, as before.
It can also be useful to look at the distribution of conditional modes, which we do with the caterpillar bars below. The blue dots are conditional models with error bars. We do this for doctors and hospitals. For example, in doctors we can see a rather long right tail in the sense that there are more extreme positive values than negative ones. For doctors, we suppress their IDs (using thescales = list(y = list(alternating = 0))
argument) because there are many, but we leave it to the hospitals.
Grade::dotplot(ranef(m3a, what = "THAT", CondVar = SAFE), scale = List(j = List(exchange = 0)))
##$THIS
Grade::dotplot(ranef(m3a, what = "ESCONDIDO", CondVar = SAFE))
## $ESCONDIDO
We can also easily add random slopes to the model and allow them to vary at each level. Let's just add a random gradient tolength of stay
varies from doctor to doctor. how to regulateR
We use the formulas+
Operator to “add” an effect and we do this in the “Random effects” section of the doctor. All terms in a bracket group use an unstructured covariance matrix, you can get a diagonal covariance structure by splitting the grouping into separate parts. Between clusters it is assumed to be independent.
# Estimate the model and store the results in mm3b <- forget(Remission ~ Years + length of stay + FamilieHx + IL6 + PCR + cancer stage + experience + (1 + length of stay | THAT) + (1 | ESCONDIDO), Data = HDP, family = binomial, nAGQ = 1)
## Warning: Failed to combine model with max|grad| converges = 34.9006 (Tol = 0.001)
# Print mod results without correlations between fixed effectspress(m3b, Spoon = INCORRECT)
## Generalized Maximum Likelihood Linear Mixed Model Adjustment (Laplace approximation##) [glmerMod]## Family: Binomial ( Logit )## Formula: ## Remission ~ Age + Length of Stay + FamilyHx + IL6 + CRP + CancerStage + # # Experience + (1 + Length of stay | DID) + (1 | HID)## Date: hdp## AIC BIC logLik Deviation df.resid ## 7148 7246 -3560 7120 8511 ## Random Effects:## Group Name Std Dev Corr ## DID (intercept) 0.504 ## dwell time 0.372 -0.11## HID (intercept) 0.731 ## number of observations: 8525, groups: DID, 407; HID, 35## Fixed Effects:## (Section) Age Length of Stay FamilyHxja## -0.5447 -0.0152 -0.1901 -1.3395 ## IL6 CRP CancerStageII CancerStageIII ## -0.0586 -0.0210 -0.2941 -0.8651 ##CancerStageIV Experience## -2.2964 0.1044
Grade::dotplot(ranef(m3b, what = "THAT", CondVar = SAFE), scale = List(j = List(exchange = 0)))
##$THIS
Grade::dotplot(ranef(m3b, what = "ESCONDIDO", CondVar = SAFE), scale = List(j = List(exchange = 0)))
## $ESCONDIDO
things to consider
Cookother problemsSection of the Introduction to GLMM page for some considerations and questions.
See too
- Introduction to GLMMs
- Regular logistic regression in R
references
- Agresti, A. (2013).Categorical data analysis(3. Aufl.), Hoboken, New Jersey: John Wiley & Sons.