Introduction: what is zero-inflation?

Put simply, if you have more $0$s in your data than you would expect, you are facing zero-inflation. One common cause of zero-inflation is overdispersion (dealt with in a separate example). If there is zero-inflation even after properly modelling overdispersion (e.g. through a different family or observation-level random effects), then we are talking real zero-inflation, in the strict sense.

We imagine the excess $0$s to be the result of observing the outcome of two co-occurring processes, each contributing some of the $0$s. Let’s take an ecological example.

Imagine we count the number of frogs in 100 ponds at different distances from the river Elbe (as in the paper by Dick et al., about to be published in J. Herpetology). We find that some ponds have no frog, others hundreds. A histogram reveals a high number of $0$s (not shown), and an excess even after using the negative binomial. The authors hypothesis that two processes determine the number of frogs in a pond: (1) the distance to the river determines whether a pond is colonised; (2) if colonised, the local conditions (pond area, hydroperiod, fish) determine survival of spawning frogs, and hence finally the number of individuals.

Mixture of distributions

Thus, our data are a mixture of two distributions: one that describes whether a frog has reached the pond, and one that describes how many eggs hatched if a frog reached the pond. In perfect analogy, we also have to model the data as a mixture of two distributions, one for each of these two processes:

\(Y \sim \begin{cases} Pois(\lambda=\text{mean abundance}) & \text{, frog arrived} \\ 0 & \text{, frog did not arrive, with probability } \pi \end{cases}\) A mixture distribution is defined (according to Wikipedia and my understanding) as “a collection of random variables derived as follows: first, a random variable is selected by chance from the collection according to given probabilities of selection, and then the value of the selected random variable is realized”. This may sound unnecessarily complicated, but essentially we use one distribution to pick another one, from which we then draw the actual realised observation $Y_i$. In our pond example, we draw from the Bernoulli distribution whether a pond has been colonised, and then draw 0, if it hasn’t, or from a Poisson if it has.

We can also write down the actual probabilities of observing $x$ frogs in a pond, remembering that the Poisson distribution looks like this: \(P(k=x) = \frac{\lambda^x e^{-lambda}}{x!}\) Then our new mixture of the Bernoulli (for the colonisation process) and Poisson (for the population dynamics) is: \(\begin{aligned} P(k=0) &= \pi (1-\pi)e^{-\lambda}\\ P(k=x) &= (1-\pi)\frac{\lambda^x e^{-\lambda}}{x!} \end{aligned}\)

So we see that our observed $0$s have two sources: those that are $0$ because of the Bernoulli distribution (the proportion $\pi$), plus those from the Poisson distribution for the ponds that have been colonised, but failed to generate surviving frogs (the proportion $(1-\pi)$ times the proportion of $0s$ in the Poisson distribution with a given $\lambda$, which we can get from the Poisson distribution equation).

Modelling mixture distributions in JAGS

We use the owl begging data set of Roulin & Bersier (2007) from the glmmADMB-package. It describes the number of begging calls (“sibling negotiations”) in a nest for females and males, being well-fed or food-deprived. The data look like this:

if ("glmmADMB" %in% rownames(installed.packages()) == FALSE){
	install.packages("R2admb")
	install.packages("glmmADMB", repos=c("http://glmmadmb.r-forge.r-project.org/repos",  getOption("repos")),  type="source")
}

library(glmmADMB)
data(Owls)
summary(Owls)
          Nest      FoodTreatment  SexParent    ArrivalTime    SiblingNegotiation   BroodSize    
 Oleyes     : 52   Deprived:320   Female:245   Min.   :21.71   Min.   : 0.00      Min.   :1.000  
 Montet     : 41   Satiated:279   Male  :354   1st Qu.:23.11   1st Qu.: 0.00      1st Qu.:4.000  
 Etrabloz   : 34                               Median :24.38   Median : 5.00      Median :4.000  
 Yvonnand   : 34                               Mean   :24.76   Mean   : 6.72      Mean   :4.392  
 Champmartin: 30                               3rd Qu.:26.25   3rd Qu.:11.00      3rd Qu.:5.000  
 Lucens     : 29                               Max.   :29.25   Max.   :32.00      Max.   :7.000  
 (Other)    :379                                                                                 
  NegPerChick     logBroodSize  
 Min.   :0.000   Min.   :0.000  
 1st Qu.:0.000   1st Qu.:1.386  
 Median :1.200   Median :1.386  
 Mean   :1.564   Mean   :1.439  
 3rd Qu.:2.500   3rd Qu.:1.609  
 Max.   :8.500   Max.   :1.946  
                                
library(lattice)
bwplot(reorder(Nest,NegPerChick)~NegPerChick|FoodTreatment:SexParent, data=Owls)

The model will become slightly complicated by the fact that “SiblingNegotiations” are measured per nest, rather than per chick. We hence would need to divide them by the number of chicks per nest, but that would yield non-integer values! The solution is to use brood size as an offset (at the link-scale, i.e. using log(brood size) instead).

Data preparation for JAGS

Let’s see how we can prepare the data for JAGS:

library(R2jags)
# prepare data as JAGS likes it:
attach(Owls)
head(Owls)
        Nest FoodTreatment SexParent ArrivalTime SiblingNegotiation BroodSize NegPerChick
1 AutavauxTV      Deprived      Male       22.25                  4         5         0.8
2 AutavauxTV      Satiated      Male       22.38                  0         5         0.0
3 AutavauxTV      Deprived      Male       22.53                  2         5         0.4
4 AutavauxTV      Deprived      Male       22.56                  2         5         0.4
5 AutavauxTV      Deprived      Male       22.61                  2         5         0.4
6 AutavauxTV      Deprived      Male       22.65                  2         5         0.4
  logBroodSize
1     1.609438
2     1.609438
3     1.609438
4     1.609438
5     1.609438
6     1.609438

Note that FoodTreatment and SexParent are factors. In a model, they need to be numerical values. The simplest way to convert them is like this:

head(as.numeric(FoodTreatment))
[1] 1 2 1 1 1 1

This leads to values of 1, 2, … . Since there are only two levels, I want them to be 0 and 1:

head(as.numeric(FoodTreatment) - 1)
[1] 0 1 0 0 0 0
# and
head(as.numeric(SexParent) - 1)
[1] 1 1 1 1 1 1

However, there is a more convenient function to do this for us, and include interactions, too!

Xterms <- model.matrix(~ FoodTreatment*SexParent, data=Owls)[,-1]
head(Xterms)
  FoodTreatmentSatiated SexParentMale FoodTreatmentSatiated:SexParentMale
1                     0             1                                   0
2                     1             1                                   1
3                     0             1                                   0
4                     0             1                                   0
5                     0             1                                   0
6                     0             1                                   0

Nice, ey? The “[,-1]” removes the intercept that would automatically be produced.

Which leads us to the JAGS-data:

OwlsData <- list(SibNeg = SiblingNegotiation, FoodTreatment=Xterms[,1], SexParent=Xterms[,2], FoodSex=Xterms[,3], Nest=Nest, lgBroodSize=log(BroodSize), N=nrow(Owls), nnests=length(levels(Owls$Nest)) )
detach(Owls)

Now comes the crucial bit!

The first (very primitive) zero-inflation model in JAGS

Note that it is costumary to model the proportion of 1s (now called $\psi = 1-\pi$), rather than the proportion of $0$s ($\pi$)!

ZIPR <- function() {
  for(i in 1:N){ # loop through all data points
    SibNeg[i] ~ dpois(mu[i]) 
    mu[i] <- lambda[i]*z[i] + 0.00001 ## hack required for Rjags -- otherwise 'incompatible'-error 
     z[i] ~ dbern(psi)
    
    log(lambda[i]) <-  lgBroodSize[i] + alpha 
    # lgBroodSize is offset
    # alpha is overall intercept
  } 
 
  # priors:
  alpha ~ dnorm(0, 0.01)     # overall model intercept
  psi ~ dunif(0, 1)          # proportion of non-zeros
}

Now we need to define which parameters to monitor, how to initialise them, and what the chain settings are:

parameters <- c("alpha", "psi") # which parameters are we interested in getting reported?

ni <- 1E3; nb <- ni/2 # number of iterations; number of burnins
nc <- 3; nt <- 5      # number of chains; thinning

inits <- function(){list(alpha=runif(1, 0, 2), psi = runif(1, 0, 1))}

And now we run it and look at the outcome.

Run the model

As usual, running the model takes a bit of time.

ZIPRjags <- jags(OwlsData, inits=inits, parameters, model.file = ZIPR, n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, working.directory = getwd())
plot(ZIPRjags)

ZIPRjags
Inference for Bugs model at "/var/folders/cc/3jfhfx190rb2ptxnqrqxj94m0000gp/T//Rtmp3nt2lM/model508825e4dc53.txt", fit using jags,
 3 chains, each with 1000 iterations (first 500 discarded), n.thin = 5
 n.sims = 300 iterations saved
          mu.vect sd.vect     2.5%      25%      50%      75%    97.5%  Rhat n.eff
alpha       0.695   0.016    0.665    0.685    0.695    0.704    0.726 1.000   300
psi         0.740   0.019    0.703    0.726    0.740    0.752    0.775 1.001   300
deviance 3655.572  10.674 3641.861 3649.165 3654.024 3662.400 3678.350 1.002   300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 57.3 and DIC = 3712.9
DIC is an estimate of expected predictive error (lower deviance is better).

So we get estimates for $\psi$ (around 0.74) and $\alpha$ (around 0.69), indicating that there is quite a bit of zero-inflation! However, our model is currently really stupid and does not use any information on the predictors to explain begging. Maybe once we put these in we can explain more of the $0$s by “Poisson-zeros”, rather than “Bernoulli-zeros” (aka excess zeros).

Model 2: add predictors for $\lambda$

ZIPR2 <- function() {
  for(i in 1:N){ # loop through all data points
    SibNeg[i] ~ dpois(mu[i]) 
    mu[i] <- lambda[i]*z[i] + 0.00001 ## hack required for Rjags -- otherwise 'incompatible'-error 
     z[i] ~ dbern(psi)
    
    log(lambda[i]) <-  lgBroodSize[i] + alpha + beta[1]*FoodTreatment[i] + beta[2]*SexParent[i] + beta[3]*FoodSex[i]
    # lgBroodSize is offset
    # alpha is overall intercept
  } 
 
  # priors:
  alpha ~ dnorm(0, 0.01)     # overall model intercept
  for (m in 1:3){
      beta[m] ~ dnorm(0, 0.01)      # Linear effects
  }  
  psi ~ dunif(0, 1)          # proportion of non-zeros
}
parameters <- c("alpha", "beta", "psi") # which parameters are we interested in getting reported?
ZIPR2jags <- jags(OwlsData, inits=inits, parameters, model.file = ZIPR2, n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, working.directory = getwd())
plot(ZIPR2jags)

ZIPR2jags
Inference for Bugs model at "/var/folders/cc/3jfhfx190rb2ptxnqrqxj94m0000gp/T//Rtmp3nt2lM/model508817f056ad.txt", fit using jags,
 3 chains, each with 1000 iterations (first 500 discarded), n.thin = 5
 n.sims = 300 iterations saved
          mu.vect sd.vect     2.5%      25%      50%      75%    97.5%  Rhat n.eff
alpha       0.777   0.037    0.709    0.754    0.778    0.802    0.844 1.015   120
beta[1]    -0.222   0.055   -0.330   -0.262   -0.220   -0.181   -0.125 1.014   170
beta[2]    -0.027   0.045   -0.112   -0.058   -0.027    0.001    0.062 1.011   150
beta[3]     0.077   0.071   -0.056    0.023    0.075    0.131    0.194 1.004   300
psi         0.743   0.018    0.708    0.732    0.744    0.755    0.774 0.997   300
deviance 3632.916  12.067 3615.077 3624.123 3631.281 3639.269 3662.814 0.999   300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 73.2 and DIC = 3706.1
DIC is an estimate of expected predictive error (lower deviance is better).

So while the model has improved (the DIC is lower by 10 units), the value for $\psi$ hasn’t changed much.

We can do better still.

Notice that so far the siblings within a nest are treated as independent, while they are in fact “nested” (pun intended). So we need to incorporate a random term for nest as well.

Model 3: add random effect for nest

ZIPR3 <- function() {
  for(i in 1:N){ # loop through all data points
    SibNeg[i] ~ dpois(mu[i]) 
    mu[i] <- lambda[i]*z[i] + 0.00001 ## hack required for Rjags -- otherwise 'incompatible'-error 
     z[i] ~ dbern(psi)
    
    log(lambda[i]) <-  lgBroodSize[i] + alpha + beta[1]*FoodTreatment[i] + beta[2]*SexParent[i] + beta[3]*FoodSex[i] + a[Nest[i]]
    # lgBroodSize is offset
    # alpha is overall intercept
    # "a" is random effect of nest; because alpha is overall intercept, a should be centred on 0.
  } 
 
  # priors:
  alpha ~ dnorm(0, 0.01)     # overall model intercept
  for (m in 1:3){
      beta[m] ~ dnorm(0, 0.01)      # Linear effects
  }  
  psi ~ dunif(0, 1)          # proportion of non-zeros
  for (j in 1:nnests){
    a[j] ~ dnorm(0, tau)     # random effect for each nest
  }
  tau ~ dgamma(0.001, 0.001) # prior for mixed effect variance

}
parameters <- c("alpha", "beta", "psi", "tau") # which parameters are we interested in getting reported?
ZIPR3jags <- jags(OwlsData, inits=inits, parameters, model.file = ZIPR3, n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, working.directory = getwd())
plot(ZIPR3jags)

ZIPR3jags
Inference for Bugs model at "/var/folders/cc/3jfhfx190rb2ptxnqrqxj94m0000gp/T//Rtmp3nt2lM/model508836ca17c9.txt", fit using jags,
 3 chains, each with 1000 iterations (first 500 discarded), n.thin = 5
 n.sims = 300 iterations saved
          mu.vect sd.vect     2.5%      25%      50%      75%    97.5%  Rhat n.eff
alpha       0.863   0.084    0.680    0.810    0.870    0.917    1.018 1.004   300
beta[1]    -0.254   0.060   -0.378   -0.289   -0.253   -0.214   -0.146 0.998   300
beta[2]    -0.070   0.046   -0.156   -0.103   -0.073   -0.036    0.016 1.006   300
beta[3]     0.082   0.073   -0.051    0.029    0.076    0.128    0.233 1.001   300
psi         0.741   0.019    0.700    0.728    0.741    0.754    0.774 1.004   300
tau         8.005   2.810    3.650    6.041    7.717    9.591   15.517 1.003   300
deviance 3336.646  12.951 3313.077 3328.174 3334.857 3344.479 3364.048 0.998   300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 84.4 and DIC = 3421.1
DIC is an estimate of expected predictive error (lower deviance is better).

While we are now seeing a dramatic improvement in fit (DIC down by another 260 units or so!), we also notice that convergence has suffered, and the $\hat{R}$-values are higher than they should be for $\alpha$. We re-adjust our settings and repeat the run (which will take around 10 times as long).

ni <- 1E4
parameters <- c("alpha", "beta", "psi", "tau") # which parameters are we interested in getting reported?
ZIPR3jags <- jags(OwlsData, inits=inits, parameters, model.file = ZIPR3, n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, working.directory = getwd())
plot(ZIPR3jags)

ZIPR3jags
Inference for Bugs model at "/var/folders/cc/3jfhfx190rb2ptxnqrqxj94m0000gp/T//Rtmp3nt2lM/model5088435e6622.txt", fit using jags,
 3 chains, each with 10000 iterations (first 500 discarded), n.thin = 5
 n.sims = 5700 iterations saved
          mu.vect sd.vect     2.5%      25%      50%      75%    97.5%  Rhat n.eff
alpha       0.861   0.087    0.688    0.804    0.863    0.920    1.029 1.013   160
beta[1]    -0.269   0.059   -0.385   -0.308   -0.268   -0.229   -0.153 1.001  5700
beta[2]    -0.079   0.046   -0.169   -0.110   -0.079   -0.048    0.014 1.001  5700
beta[3]     0.095   0.073   -0.048    0.046    0.095    0.144    0.235 1.001  5700
psi         0.741   0.018    0.705    0.729    0.741    0.753    0.775 1.001  5700
tau         7.732   2.711    3.448    5.836    7.361    9.273   14.022 1.002  1300
deviance 3337.793  14.126 3315.107 3327.365 3336.244 3346.369 3369.838 1.001  5000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 99.8 and DIC = 3437.6
DIC is an estimate of expected predictive error (lower deviance is better).

Okay. Again, $\psi$ is still high and seems to be a feature of the data, rather than due to our poor modelling of $\lambda$.

So, a first result interpretation is indicated: What effects do you see, and what do they mean?

Model 4: add effect of brood size on whether the chicks call at all

It could be that a clutch of chicks is more vocal when it is larger. A single chick may remain silent more often than it would when in a group of siblings (maybe I am extrapolating too much from football supporters on their way to the stadium). Statistically, we can make $\psi$ a function of other predictors, too, in this case of brood size. Let’s try.

ZIPR4 <- function() {
  for(i in 1:N){ # loop through all data points
    SibNeg[i] ~ dpois(mu[i]) 
    mu[i] <- lambda[i]*z[i] + 0.00001 ## hack required for Rjags -- otherwise 'incompatible'-error 
    
    z[i] ~ dbern(psi[i])
    logit(psi[i]) <- alpha.psi + beta.psi*exp(lgBroodSize[i]) 
    
    log(lambda[i]) <-  lgBroodSize[i] + alpha + beta[1]*FoodTreatment[i] + beta[2]*SexParent[i] + beta[3]*FoodSex[i] + a[Nest[i]]
    # lgBroodSize is offset
    # alpha is overall intercept
    # "a" is random effect of nest; because alpha is overall intercept, a should be centred on 0.
  } 
 
  # priors:
  alpha ~ dnorm(0, 0.01)     # overall model intercept
  for (m in 1:3){
      beta[m] ~ dnorm(0, 0.01)      # Linear effects
  }  
  # remove this: psi ~ dunif(0, 1)          # proportion of non-zeros
  for (j in 1:nnests){
    a[j] ~ dnorm(0, tau)     # random effect for each nest
  }
  tau ~ dgamma(0.001, 0.001) # prior for mixed effect variance
  alpha.psi ~ dnorm(0, 0.01)
  beta.psi  ~ dnorm(0, 0.01)

}

Since I am too lazy to re-code the inits-function, I simply set the inits-argument to auto-pilot.

parameters <- c("alpha", "beta", "tau", "alpha.psi", "beta.psi") # which parameters are we interested in getting reported?
ZIPR4jags <- jags(OwlsData, inits=NULL, parameters, model.file = ZIPR4, n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, working.directory = getwd())
plot(ZIPR4jags)

ZIPR4jags
Inference for Bugs model at "/var/folders/cc/3jfhfx190rb2ptxnqrqxj94m0000gp/T//Rtmp3nt2lM/model508833669bf.txt", fit using jags,
 3 chains, each with 10000 iterations (first 500 discarded), n.thin = 5
 n.sims = 5700 iterations saved
           mu.vect sd.vect     2.5%      25%      50%      75%    97.5%  Rhat n.eff
alpha        0.873   0.083    0.709    0.819    0.872    0.926    1.039 1.012   300
alpha.psi   -0.828   0.373   -1.553   -1.083   -0.827   -0.570   -0.081 1.001  5700
beta[1]     -0.272   0.058   -0.385   -0.311   -0.272   -0.232   -0.158 1.001  5600
beta[2]     -0.081   0.046   -0.171   -0.112   -0.081   -0.049    0.008 1.002  2100
beta[3]      0.099   0.072   -0.043    0.051    0.099    0.147    0.239 1.001  5700
beta.psi     0.442   0.086    0.273    0.384    0.441    0.501    0.611 1.001  5700
tau          7.678   2.707    3.495    5.730    7.309    9.222   13.889 1.002  2200
deviance  3336.929  13.632 3315.456 3327.124 3335.439 3344.824 3368.752 1.001  5700

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 92.9 and DIC = 3429.8
DIC is an estimate of expected predictive error (lower deviance is better).

And this model is better still (although “only” by 5 DIC-units).

So the one thing that we could still add is overdispersion as observation-level random effect. This is more to show that we can, and less because I think it is really necessary.

Model 5: add OLRE-overdispersion

We add OLRE in the form of an additive effect $\xi$ at the level of the Poisson regression. All $\xi$ are normally distributed with mean 0 (otherwise they’d compete with intercept $\alpha$), and the precision of that normal distribution is taken to be $\gamma$-distributed (as is common for precision).

Note that we now have two competing random effects: one at the level of the nest ($a$) and one at the level of the individual observation ($\xi$).

ZIPR5 <- function() {
  for(i in 1:N){ # loop through all data points
    SibNeg[i] ~ dpois(mu[i]) 
    mu[i] <- lambda[i]*z[i] + 0.00001 ## hack required for Rjags -- otherwise 'incompatible'-error 
    
    z[i] ~ dbern(psi[i])
    logit(psi[i]) <- alpha.psi + beta.psi*exp(lgBroodSize[i]) 
    
    log(lambda[i]) <-  lgBroodSize[i] + alpha + beta[1]*FoodTreatment[i] + beta[2]*SexParent[i] + beta[3]*FoodSex[i] + a[Nest[i]] + xi[i]
    # lgBroodSize is offset
    # alpha is overall intercept
    # "a" is random effect of nest; because alpha is overall intercept, a should be centred on 0.
  } 
 
  # priors:
  alpha ~ dnorm(0, 0.01)     # overall model intercept
  for (m in 1:3){
      beta[m] ~ dnorm(0, 0.01)      # Linear effects
  }  
  # remove this: psi ~ dunif(0, 1)          # proportion of non-zeros
  for (j in 1:nnests){
    a[j] ~ dnorm(0, tau)     # random effect for each nest
  }
  tau ~ dgamma(0.001, 0.001) # prior for mixed effect variance
  alpha.psi ~ dnorm(0, 0.01)
  beta.psi  ~ dnorm(0, 0.01)

  for (i in 1:N){
  	xi[i] ~ dnorm(0, tau.xi) # on average, xi should be 0 otherwise it competes with the intercept alpha!
  }
  tau.xi ~ dgamma(0.001, 0.001) # prior for mixed effect variance

}
parameters <- c("alpha", "beta", "tau", "alpha.psi", "beta.psi", "tau.xi") # which parameters are we interested in getting reported?
ZIPR5jags <- jags(OwlsData, inits=NULL, parameters, model.file = ZIPR5, n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, working.directory = getwd())
plot(ZIPR5jags)

ZIPR5jags
Inference for Bugs model at "/var/folders/cc/3jfhfx190rb2ptxnqrqxj94m0000gp/T//Rtmp3nt2lM/model50881f668080.txt", fit using jags,
 3 chains, each with 10000 iterations (first 500 discarded), n.thin = 5
 n.sims = 5700 iterations saved
           mu.vect sd.vect     2.5%      25%      50%      75%    97.5%  Rhat n.eff
alpha        0.719   0.109    0.511    0.644    0.717    0.791    0.948 1.032    91
alpha.psi   -0.746   0.417   -1.563   -1.027   -0.750   -0.459    0.050 1.001  3100
beta[1]     -0.498   0.143   -0.781   -0.593   -0.497   -0.404   -0.214 1.018   170
beta[2]     -0.104   0.106   -0.310   -0.176   -0.105   -0.036    0.114 1.017   140
beta[3]      0.182   0.169   -0.152    0.069    0.182    0.298    0.506 1.016   170
beta.psi     0.459   0.099    0.270    0.391    0.460    0.525    0.655 1.001  4000
tau         10.122   5.770    3.544    6.427    8.858   12.138   24.233 1.006   530
tau.xi       2.336   0.269    1.841    2.149    2.323    2.505    2.903 1.002  2400
deviance  2193.981  34.979 2128.450 2169.281 2193.797 2217.590 2262.885 1.001  3600

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 611.6 and DIC = 2805.6
DIC is an estimate of expected predictive error (lower deviance is better).

Oh, that’s somewhat of a surprise! The DIC dropped precipitously to just under 2800, i.e. by over 500 units. So I guess the data were substantially overdispersed, not only zero-inflated. Zero-inflation is still prevalent, and we should plot the relationship for $\psi$ to see which values it takes.

curve(plogis(-0.75 + 0.46*x), from=min(Owls$BroodSize), to=max(Owls$BroodSize), lwd=2, las=1, xlab="brood size")

Obviously we should do this with all samples, not just the mean estimates:

# str(ZIPR5jags$BUGSoutput$sims.list)
curve(plogis(-0.75 + 0.46*x), from=min(Owls$BroodSize), to=max(Owls$BroodSize), lwd=1, col="red", las=1, xlab="brood size", ylab="proportion of non-0s in the data")
for (i in 1:length(ZIPR5jags$BUGSoutput$sims.list$alpha.psi)){
  thisA <- ZIPR5jags$BUGSoutput$sims.list$alpha.psi[i]
  thisB <- ZIPR5jags$BUGSoutput$sims.list$beta.psi[i]
  curve(plogis(thisA + thisB*x), from=min(Owls$BroodSize), to=max(Owls$BroodSize), add=T, col=rgb(0.1, 0.1, 0.1, 0.01))
}

So the proportion of $0$s is between 0.4 and 0.9, and the trend is positive (i.e. more non-$0$s when there are more siblings). So owl chicks are very much like football supporters, it seems.

Model diagnostics: simulation from model (5b)

So far we have not spend any time on evaluating whether any of the models was really “good”. That is not a trivial task, and we need to consider a new idea before being able to do so.

We call a model “good”, if it is able to invent data that look like those we used to fit it to.

That is (I hope) logical. If a model fits poorly, then simulating (= inventing) data based on this model will lead to data that may look very different from the original data. A near-perfect fit, in contrast, will yield simulated data very similar to those observed.

In the following code, we simulate data from the model, not once, but several thousand times. We can then see, how our observed data are positioned within the several thousand simulations (e.g. on which quantile they lie; this is called the “Bayesian p-value”).

To simulate, it is easiest to use JAGS itself, rather than its output. To do so, we “invent” our response again, within the model, with a new name (in this case S.SibNeg, with S. standing for “simulated”). Unsurprisingly, almost doubling the number of parameters will also lead to substantially longer computation time!

ZIPR5s <- function() {
  for(i in 1:N){ # loop through all data points
    SibNeg[i] ~ dpois(mu[i]) 
    mu[i] <- lambda[i]*z[i] + 0.00001 ## hack required for Rjags -- otherwise 'incompatible'-error 
    
    z[i] ~ dbern(psi[i])
    logit(psi[i]) <- alpha.psi + beta.psi*exp(lgBroodSize[i]) 
    
    log(lambda[i]) <-  lgBroodSize[i] + alpha + beta[1]*FoodTreatment[i] + beta[2]*SexParent[i] + beta[3]*FoodSex[i] + a[Nest[i]] + xi[i]
    # lgBroodSize is offset
    # alpha is overall intercept
    # "a" is random effect of nest; because alpha is overall intercept, a should be centred on 0.
  } 
  
  # priors:
  alpha ~ dnorm(0, 0.01)     # overall model intercept
  for (m in 1:3){
      beta[m] ~ dnorm(0, 0.01)      # Linear effects
  }  
  # remove this: psi ~ dunif(0, 1)          # proportion of non-zeros
  for (j in 1:nnests){
    a[j] ~ dnorm(0, tau)     # random effect for each nest
  }
  tau ~ dgamma(0.001, 0.001) # prior for mixed effect variance
  alpha.psi ~ dnorm(0, 0.01)
  beta.psi  ~ dnorm(0, 0.01)

  for (i in 1:N){
  	xi[i] ~ dnorm(0, tau.xi) # on average, xi should be 0 otherwise it competes with the intercept alpha!
  }
  tau.xi ~ dgamma(0.001, 0.001) # prior for mixed effect variance

  # # # # # # # # # # # # # # # 
  # simulate data here:
  # replace all latent variables L with an S.L (mu, lambda, psi, z), as well as the response:
  for (i in 1:N){ # loop through all data points
    S.SibNeg[i] ~ dpois(S.mu[i]) 
    S.mu[i] <- S.lambda[i]*z[i] + 0.00001 ## hack required for Rjags -- otherwise 'incompatible'-error 
    
    S.z[i] ~ dbern(S.psi[i])
    logit(S.psi[i]) <- alpha.psi + beta.psi*exp(lgBroodSize[i]) 
    
    log(S.lambda[i]) <-  lgBroodSize[i] + alpha + beta[1]*FoodTreatment[i] + beta[2]*SexParent[i] + beta[3]*FoodSex[i] + a[Nest[i]] + xi[i]
  } 
  
}
parameters <- c("alpha", "beta", "tau", "alpha.psi", "beta.psi", "tau.xi", "S.SibNeg") # which parameters are we interested in getting reported?
ZIPR5sjags <- jags(OwlsData, inits=NULL, parameters, model.file = ZIPR5s, n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, working.directory = getwd())

We rather not look at the plot, where there would now be 599 values for S.SibNeg in addition to all the model parameters we have looked at before. Same for the summary of the model, which also should be the same as in model 5.

Instead, we extract the simulated data for each original data point. First, as an example, for the first data point only:

plot(ecdf(ZIPR5sjags$BUGSoutput$sims.list$S.SibNeg[,1]+rnorm(5700, 0, 0.1)), verticals=T)
abline(v=Owls$SiblingNegotiation[1], lty=3)
abline(h=ecdf(ZIPR5sjags$BUGSoutput$sims.list$S.SibNeg[,1]+rnorm(5700, 0, 0.1))(Owls$SiblingNegotiation[1]), lty=3)

Notice that adding some noise smoothes out the ECDF-curve, as has been recommended (somewhere).

So we see that the first data point (4 calls) lies roughly at the 0.5 quantile of the simulated data. Let’s do this computation for all observations (and simulations), and plot these quantiles:

qq <- numeric(599)
for (i in 1:599){
    qq[i] <- ecdf(ZIPR5sjags$BUGSoutput$sims.list$S.SibNeg[,i]+rnorm(5700, 0, 0.1))(Owls$SiblingNegotiation[i]+rnorm(5700, 0, 0.1))
}
plot(density(qq, from=0, to=1), main="Bayesian p-values of observations")

summary(qq)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00193 0.38590 0.54750 0.49990 0.62780 0.96750 

Ideally, we want our observations to be more or less evenly distributed across the range from $0$ to $1$. That is clearly not the case. What we see is that most observations are around quantiles 0.45 and 0.6, in a very non-uniform fashion.

par(mfrow=c(1,2))
plot(apply(ZIPR5sjags$BUGSoutput$sims.list$S.SibNeg, 2, mean), Owls$SiblingNegotiation, las=1, xlab="expected", ylab="observed")
abline(0,1)
plot(qq ~ apply(ZIPR5sjags$BUGSoutput$sims.list$S.SibNeg, 2, mean), las=1, xlab="expected", ylab="standardised residuals")

The quantile-quantile-plot looks almost fine, with some overestimation at high values. The standardised residuals (which are actually the quantiles) show should no pattern with expectation, however!

That means: we’re not done yet. More model tuning is required to improve the model, so that the distribution of simulations is more in line with the distribution of observations. (Small note: there is an integer problem here, so some tricks such as adding noise was indicated. See DHARMa and its vignette for some comments on that.)