1 Introduction

1.1 Objective

The purpose of this study is to provide South Florida Water Management District (SFWMD) an estimate of the effect of reduced recruitment into the population, identifying critical thresholds for productivity under conservative scenarios. A Population Viability Analysis (PVA) shall take into account the impact of reduced nesting (either through failures of nest attempts or through reduced attempts) to the viability of the state-wide snail kite population, as well as whether regional differences in nesting effort are important. In other words, evaluate the overall impact of reduced nesting, as well as the impact of how that reduction is distributed among six historical breeding regions.

This work order goal is to provide snail kite nest location and population information for management, as well as developing a PVA for long-term mitigation and restoration planning activities.

2 Conceptual framework, packages, and functions

Previous PVAs for the snail kite have used Capture-Mark-Recapture (Resight) models. Here, we will conduct a count-based PVA applying Stochastic State-Space models.

First, let’s load the packages in R needed to run this code

# R functions and datasets to support "Modern Applied Statistics with S", 
  #a book from W.N. Venables and B.D. Ripley
library(MASS); 
# R functions for Data Cloning (maximum likelihood estimation using Bayesian MCMC)
library(dclone); 
# Create plots for MCMC output
library(mcmcplots)
# Data manipulation and elegant figures
library(tidyverse)
# Palettes inspired by works at the Metropolitan Museum of Art in New York
library(MetBrewer)

2.1 Simple (one-population) Stochastic Gompertz State Space

Let $S_{t}$ be the (unobserved) population abundance at time $t$, the latent or state ecological process for the Gompertz State Space model (Dennis et al., 2006). The population dynamic is $S_t = S_{t-1} ~e^{(a+b\times\ln{S_{t-1}} + E_t)}$, where $a$ and $b$ are constants representing the intrinsic population growth rate and the strength of density dependence, respectively, and $E_t$ is the environmental stochasticity or process noise, $E_t\sim \text{Normal}(0,\sigma^2)$ (Dennis et al., 2006). On the logarithmic scale ($X_t = \ln{S_t}$), the GSS becomes linear and follows an autoregressive model of order 1: $X_t = X_{t-1} + a + b*X_{t-1} + E_t$, which can be simplify as $X_t = a + c*X_{t-1} + E_t$, where $c = b+1$ is a constant that represents the strength of density dependence (Dennis et al., 2006; Ponciano et al., 2018; Reddingius, 1971 in Acta Biotheor, 20, 1-208). If $b=0$ ($c=1$), the GSS reduces to the density-independent model fully treated in state-space model form by Humbert et al. (2009).

This population model has three unknown parameters: $a$, $c$, and $\sigma^2$ (Dennis et al., 2006). The transition probability distribution of this logarithmic abundance model is normal, with mean and variance changing as a function of time. The parameter $c$ represents the strength of density-dependence (Ponciano et al., 2018). If the strength of density dependence ($c$) ranges $-1<c<1$, the long-run probability distribution of log-abundance approaches a time-independent normal stationary distribution ($X_t→X_∞$), with mean $\frac{a}{1-c}$ and variance $\frac{\sigma^2}{1-c^2}$. Thus, instead of approaching a single population abundance value, or a deterministic carrying capacity, the density-dependent stochastic GSS model approaches a stationary distribution, a cloud of points around which the population fluctuates (Dennis & Taper, 1994; Wolda, 1989). The mean of this distribution, $\mu = \frac{a}{1-c}$, represents a long-term expected (log)population size (Dennis et al., 2006). As mentioned above, the density-independent stochastic exponential growth model of Dennis et al. (1991) is attained when $b=0$ ($c=1$). The state-space model formulation for these two models is completed with the specification of the sampling (observation) error model.

We assumed that the observed counts at time $t$, $N_t$, is a sample drawn from the Poisson process which mean $\lambda_t$ is the exponential latent state ($X_t$), $N_t \sim \text{Poisson}(\lambda_t) = \text{Poisson}(e^{X_t})$. Note that this differs from the observer error definition of GSS in Dennis et al. (2006).

To generalize the GSS for unequal sampling intervals (and NAs), we take advantage of the mathematical insight that the solution of the discrete-time GSS model matches the solution at discrete time points of a diffusion process, which is a continuous time stochastic process. Specifically, the solution of the discrete time Gompertz model in the log scale matches exactly the Ornstein-Uhlenbeck (OU) Gaussian diffusion process (Dennis & Ponciano, 2014). This OU diffusion process is a well-known continuous-time version of an autoregressive process of order 1, characterized by a joint multivariate normal distribution of values across time points. The process is defined by its infinitesimal mean, variance, and covariance parameters (Dennis & Ponciano, 2014; Ponciano, 2018).

The infinitesimal mean and variance of the process are given by: $m_S (s)= \theta s[\ln \kappa - \ln s]$ and $\sigma_S^2 (s)= \beta^2 s^2$, respectively; here $\theta$ represents the speed of equilibration, $\kappa$ the equilibrium abundance, and $\beta$ scales the random perturbation by environment stochasticity, $dW$. A smooth transformation to $S_t$, given by $X_t=g(S_t)$ (e.g. $X_t=\ln ⁡S_t$), is also a diffusion process whose infinitesimal mean is given by $m_X (x)=m_S (s) g'(s) + 1/2 \sigma_S^2 (s)g''(s)$ and infinitesimal variance by $\sigma_X^2 (x) = \sigma_S^2 s [g' (s)]^2$, where $s=g^{-1} (x)$. This result is the well-known Itô-transformation for diffusion processes widely used in stochastic population dynamics modeling. For $g(s) = \ln s$, the infinitesimal mean simplifies to $m_X (x)= \theta (\mu - x)$, where $\mu = \ln\kappa - \frac{\beta^2}{2 \theta}$, and the infinitesimal variance becomes $\sigma_X^2 (x)= \beta^2$. Thus, the stochastic differential equation of the logarithmic process $X_t$ is $dX_t= \theta(\mu - X_t)dt + \beta dW_t$. If the process starts at an initial log-abundance $X_0=x_0$, the expected value and variance at any time $t$ are given by $\text{E}[X_t | X_0 = x_0] = \mu - (\mu - x_0) e^{\theta - t}$ and $\text{V}[X_t|X_0 = x_0] = \frac{\beta^2}{2\theta} (1-e^{-2\theta})$ respectively (Dennis & Ponciano, 2014). Over time, the process converges to a stationary distribution with mean $\mu$ and variance $\frac{\beta^2}{2\theta}$.

The three unknown parameters to estimate are:

\[ \begin{array}{ccc} \mu &=& \frac{a}{1-c} \\ \theta &=& -\ln{c} \\ \sigma^2 &=& \frac{(1-e^{-2\theta})\beta^2}{2\theta} \\ \end{array} \]

Note the direct link between $\theta$ and $c$. Also, note that the quantity $K = \exp\{\mu\} = \exp\{a/(1-c)\}$ corresponds to the carrying capacity value of the deterministic Gompertz difference equation. Typically, in the work with Snail Kites the carrying capacity population size is defined as the “breeding carrying capacity”, or the maximum number of breeders a given population can sustain. This definition replaces the traditional carrying capacity defined as the equilibrium population abundance of a population dynamics model.

In this report we take a completely different approach than these two definitions. Once the influence of environmental variability is admitted (as it has been the practice since the seminal paper of Dennis et al 1991), a single point equilibrium value looses meaning. The return point of a population size is no longer a single equilibrium value. Rather, the equilibrium of a stochastic population model is a long-term stationary distribution of population sizes, a cloud of points (Wolda 1989) around which population fluctuates. This stationary distribution is approximated under the Gompertz model with a positively skewed log-normal distribution (Dennis et al 2006). The ecological ramification of this conceptualization of equilibrium is that persistence above any threshold can be defined as a probability: it is the area under that log-normal probability density function to the right of that threshold. This persistence probability represents the long-term average proportion of time the target population size will be found above that particular threshold.

Adopting this definition of persistence probability allows to estimate population persistence outcomes under different thresholds. As well, this conceptualization allows to examine how any hypothesized change in the growth rate ultimately affects changes in the entire “cloud of points”, i.e., changes in the long-run distribution of population sizes. Thus, estimating a stationary distribution from time series data ultimately gives a benchmark to dimension hypothesized or even recently observed changes in the growth rate as changes in the proportion of time a population size will be found around any set of values.

In what follows we will use this conceptualization to 1) estimate/measure the impact of reduced growth rates in persistence probabilities and 2) estimate how changes in water levels affect the growth rate of each deme (sub-population) and how these changes in growth rate affect in turn local and regional persistence probabilities.

Finally, note that an open problem in Conservation Biology is how to best estimate a critical density below which a given population goes extinct with high certainty and above which it persists with high certainty. Theoretical approaches using stochastic population dynamics models with Allee effects have long defined precisely this critical density, and demonstrated theoretically how the persistence probability of a population indeed dramatically drops below such abundance and increases above it (Dennis 2002). In the last part of this report we give an initial estimate of this critical density by fitting a stochastic model with an Allee effect.

2.2 Multiple-subpopulations (regions) Stochastic Gompertz State Space

To include the six regions (e.g., demes, spatially differentiated subpopulations) of study in the snail kite superpopulation in Florida, we can adjust the equation for the population ecological process as:

\[ S_{t+1,i} = S_{t,i}\times\exp\left\{a_{i} + b_{i1}\ln\,S_{t,1} + b_{i2}\ln\,S_{t,2} + b_{i3}\ln\,S_{t,3} + b_{i4}\ln\,S_{t,4} + b_{i5}\ln\,S_{t,5} + b_{i6}\ln\,S_{t,6}+ E_{t,i} \right\},\\\quad i=1,2,3,4,5,6. \] where $S_{t+1,i}$ is the abundance population in the region $i$ in the next time $t+1$, which depends on the current time abundance in that same region ($S_{t,i}$) and it is affected by the intrinsic growth parameter ($a_{i}$), the density dependence in that region ($b_{ii}\ln\,S_{t,i}$), and the density dependence and current time abundance from the other regions ($b_{ij}\ln\,S_{t,j}$). The $E_{t,i}$ are $\rm{iid}$ Normal with mean 0 and variance $\sigma^{2}$. Let’s set $\ln\,S_{t,i} = X_{t,i}$. Then, the model can be re-written using as a response variable the abundance change of the $i^{th}$ region:

\[ X_{t+1,i} - X_{t,i} = a_{i} + b_{i1}X_{t,1} + b_{i2}X_{t,2} + b_{i3}X_{t,3} + b_{i4}X_{t,4} + b_{i5}X_{t,5} + b_{i6}X_{t,6} + E_{t,i} ,\\\quad i=1,2,3,4,5,6. \] In other words, the $b_{ii}$ are interpreted as the intra-region, density-dependent effects, while the $b_{ij}$ are interpreted as the non-linear effect of region $(j)$’s population density on the growth rate of the population $(i)$

The model in full matrix format is: \[ \mathbf{X}_{t+1} = \mathbf{a} + (\mathbf{I} + \mathbf{B})\mathbf{X}_t + \mathbf{E}_t \]

The components of this matrix model are:

First, the State vector: \[ \mathbf{X}_t = \begin{bmatrix} X_{t,1} \\ X_{t,2} \\ X_{t,3} \\ X_{t,4} \\ X_{t,5} \\ X_{t,6} \end{bmatrix}, \]

then, the vector of intrinsic growth parameters for each region: \[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \\ a_6 \end{bmatrix}. \] Next, the simple identity matrix (the equivalent of number 1 for matrices):

\[ \mathbf{I} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}, \]

followed by the key parameter in this model, which is the interaction matrix: \[ \mathbf{B} = \begin{bmatrix} b_{11} & b_{12} & b_{13} & b_{14} & b_{15} & b_{16} \\ b_{21} & b_{22} & b_{23} & b_{24} & b_{25} & b_{26} \\ b_{31} & b_{32} & b_{33} & b_{34} & b_{35} & b_{36} \\ b_{41} & b_{42} & b_{43} & b_{44} & b_{45} & b_{46} \\ b_{51} & b_{52} & b_{53} & b_{54} & b_{55} & b_{56} \\ b_{61} & b_{62} & b_{63} & b_{64} & b_{65} & b_{66} \end{bmatrix}. \]

The final component is the vector of environmental noise values for every time step as a multivariate normal distribution (MVN):

\[ \mathbf{E}_t \sim \text{MVN}\left(\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2\mathbf{I}\right). \]

This matrix model is arrived at by writing the model equations for each region $i=1,2,3,4,5,6$: \[ \begin{align} X_{t+1,1} &= a_1 + (1 + b_{11})X_{t,1} + b_{12}X_{t,2} + b_{13}X_{t,3} + b_{14}X_{t,4} + b_{15}X_{t,5} + b_{16}X_{t,6} +E_{t,1}, \\ X_{t+1,2} &= a_2 + b_{21}X_{t,1} + (1 + b_{22})X_{t,2} + b_{23}X_{t,3} + b_{24}X_{t,4} + b_{25}X_{t,5} + b_{26}X_{t,6} + E_{t,2}, \\ X_{t+1,3} &= a_3 + b_{31}X_{t,1} + b_{32}X_{t,2} + (1 + b_{33})X_{t,3} + b_{34}X_{t,4} + b_{35}X_{t,5} + b_{36}X_{t,6} + E_{t,3}, \\ X_{t+1,4} &= a_4 + b_{41}X_{t,1} + b_{42}X_{t,2} + b_{43}X_{t,3} + (1 + b_{44})X_{t,4} + b_{45}X_{t,5} + b_{46}X_{t,6} +E_{t,4}, \\ X_{t+1,5} &= a_5 + b_{51}X_{t,1} + b_{52}X_{t,2} + b_{53}X_{t,3} + b_{54}X_{t,4} + (1 + b_{55})X_{t,5} + b_{56}X_{t,6} + E_{t,5}, \\ X_{t+1,6} &= a_6 + b_{61}X_{t,1} + b_{62}X_{t,2} + b_{63}X_{t,3} + b_{64}X_{t,4} + b_{65}X_{t,5} + (1 + b_{66})X_{t,6} + E_{t,6}, \end{align} \]

which fully written in matrix format looks like this: \[ \begin{bmatrix} X_{t+1,1} \\ X_{t+1,2} \\ X_{t+1,3} \\ X_{t+1,4} \\ X_{t+1,5} \\ X_{t+1,6} \end{bmatrix} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \\ a_6 \end{bmatrix} + \left( \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} + \begin{bmatrix} b_{11} & b_{12} & b_{13} & b_{14} & b_{15} & b_{16} \\ b_{21} & b_{22} & b_{23} & b_{24} & b_{25} & b_{26} \\ b_{31} & b_{32} & b_{33} & b_{34} & b_{35} & b_{36} \\ b_{41} & b_{42} & b_{43} & b_{44} & b_{45} & b_{46} \\ b_{51} & b_{52} & b_{53} & b_{54} & b_{55} & b_{56} \\ b_{61} & b_{62} & b_{63} & b_{64} & b_{65} & b_{66} \end{bmatrix} \right) \begin{bmatrix} X_{t,1} \\ X_{t,2} \\ X_{t,3} \\ X_{t,4} \\ X_{t,5} \\ X_{t,6} \end{bmatrix} + \begin{bmatrix} E_{t,1} \\ E_{t,2} \\ E_{t,3} \\ E_{t,4} \\ E_{t,5} \\ E_{t,6} \end{bmatrix}. \]

The effect of migration between region $i$ and region $j$ is specified in the $\mathbf{B}$ matrix elements $b_{ij}, i\neq j$. We will assume that the probability $m$ that migration into a region $i$ from a region $j$ will have an effect in the growth rate of region $i$ is a constant weighted by the distance between these two regions. Thus, we set $b_{ij} = \frac{m}{d_{ij}}, i\neq j$, where $d_{ij}$ is the distance between these two regions. We note that, $m$ can be interpreted as the migration probability out of a region, if that region is only 1 spatial unit away from another region. If there are obvious covariates to model $m$, they could be included in the model, with the caveat that more parameters would need to be specified to phrase such dependency(ies) of $m$. Finally, note that the sum of the identity matrix with the $\mathbf{B}$ matrix could be simply re-written as a single matrix:

\[ \begin{bmatrix} b_{11}+1 & b_{12} & b_{13} & b_{14} & b_{15} & b_{16} \\ b_{21} & b_{22}+1 & b_{23} & b_{24} & b_{25} & b_{26} \\ b_{31} & b_{32} & b_{33}+1 & b_{34} & b_{35} & b_{36} \\ b_{41} & b_{42} & b_{43} & b_{44}+1 & b_{45} & b_{46} \\ b_{51} & b_{52} & b_{53} & b_{54} & b_{55}+1 & b_{56} \\ b_{61} & b_{62} & b_{63} & b_{64} & b_{65} & b_{66}+1 \end{bmatrix} = \begin{bmatrix} c_{11} & b_{12} & b_{13} & b_{14} & b_{15} & b_{16} \\ b_{21} & c_{22} & b_{23} & b_{24} & b_{25} & b_{26} \\ b_{31} & b_{32} & c_{33} & b_{34} & b_{35} & b_{36} \\ b_{41} & b_{42} & b_{43} & c_{44} & b_{45} & b_{46} \\ b_{51} & b_{52} & b_{53} & b_{54} & c_{55} & b_{56} \\ b_{61} & b_{62} & b_{63} & b_{64} & b_{65} & c_{66} \end{bmatrix} = \mathcal{B} \] The $c_{ii}$ constants define the strength of density dependence (see Ponciano et al., (2018) for a formal definition). The smaller the values while being less than 1, the stronger the density-dependent effects. The closer these values are to 1, the closer the system is to density independence with $c_{ii} = 1$ denoting perfect density independence. In what follows, instead of the matrix $\mathbf{B}$ we will use the matrix $\mathcal{B}$.

Finally, the Poisson observation process is also extended to the different regions as

\[ N_{i,t} \sim \text{Poisson}(\lambda_{i,t}) = \text{Poisson}(e^{X_{t,i}}) \] or in vector form as

\[ \mathbf{N}_{t} \sim \text{Poisson}(e^{\mathbf{X}_{t}}) \]

2.3 Data processing for parameter estimation

We will assume that surveys 4, 5, and 6 constitute each three replicates for the annual population abundance counts per region. We will processing the original data to have:

Three matrices, each with 6 + 1 rows (one per subpopulation + “OTHER”) and as many columns as years the SNKI monitoring has lasted
Three accompanying matrices that have a number one or a zero in each cell indicating whether for that cell, which is a combination of year and population, an “NA” is in the corresponding count matrix. These matrices will be useful for the JAGS estimation procedure.

rawdat <- read.csv("Code/snailkite counts 1997_2025.csv")
head(rawdat, 12) # first 12 rows

tail(rawdat, 12) # last 12 rows

Note that every row represents a survey (1 to 6; column survey_num) per year (1997 to 2025; column year), with 174 survey counts identified in column X ($6 \times 29 = 174$). The remain columns present the count data per region (or ‘deme’).

We can extract information from this dataset to be organized according to the modeling requirements:

names.rawdat <- names(rawdat)
deme.names <- c("EAST","EVER","KRV","OKEE","PP","SJM","OTHER" )
cols.with.counts <- match(deme.names, names.rawdat)

years <- unique(rawdat$year)
nyears <- length(years)
year.col <- which(names.rawdat=="year",arr.in=TRUE)

We are using a simple time scale from 0 to 28 tt, but we can use the ‘years’ vector above to do the plots.

tt <- years - years[1]

We can also organize the number of surveys

# List the survey numbers:
surv.num <- unique(rawdat$survey_num)

We can save the reorganized data in a list of three matrices, one per survey (Surveys 4, 5, and 6 are S4mat, S5mat, and S6mat, respectively). We can also extract the NA presence in a list of three matrices (NAS.list <- list(NA4mat = S4mat, NA5mat=S5mat, NA6mat=S6mat)).

S4mat <- S5mat <- S6mat <- matrix(0, nrow=7, ncol=nyears)

Counts.list <- list(S4mat = S4mat, S5mat=S5mat, S6mat=S6mat)
NAS.list  <- list(NA4mat = S4mat, NA5mat=S5mat, NA6mat=S6mat)
short.inds.list <- list()

And finally, we can iterativelly for each of the three surveys (i in 1:3) fill the matrices in the empty lists. First we will use the survey number to locate our survey of interest (surv.num[(i+3)]). Then, we ask the row number of the surveys within the rawdat dataset. We extract the rows for each survey in each region or subpopulation, and transpose this subset t(). The columns of this transposed subset correspond to the time series vector tt (because is only one of the surveys), so from 0 to 28 (29 years). This matrix is saved in Counts.lists. Then we identify the years with NA for each supbopulation, and save them in NAS.list. Finally, for each population (6 + “OTHER”), we save the counts with NAs.

for(i in 1:3){
  
  SN <- surv.num[(i+3)]
  
  ith.rows   <- which(rawdat$survey_num==SN, arr.ind=TRUE)
  
  #ith.datmat <- rawdat[ith.rows, c(year.col,cols.with.counts)]
  ith.datmat <- t(rawdat[ith.rows, cols.with.counts])
  colnames(ith.datmat) <- tt
  
  Counts.list[[i]] <- ith.datmat
  
  ith.isnamat <- 1-is.na(ith.datmat)
  NAS.list[[i]] <- ith.isnamat
  
  ith.shortind.list <- list()
  for(j in 1:7){
    
    ith.shortind.list[[j]] <- which(ith.isnamat[j,]==1,arr.ind=TRUE)
    
  }
  names(ith.shortind.list) <- deme.names
  short.inds.list[[i]] <- ith.shortind.list 
    
}
names(short.inds.list) <- c("short.inds4","short.inds5", "short.inds6")

And save this data

save.image("Code/SNKIdata.RData")

2.4 Functions for initial guesses of parameters and random MVN generator

We will need initial or “naive” guesses of the parameters to fit the state-space model. This could be roughly computing by providing the vector of (log)abundance observations Yobs ($\mathbf{N}$) and the vector of observation times Tvec with the function guess.calc():

guess.calc <- function(Yobs,Tvec){
  
  #  For calculations, time starts at zero.
  T.t <-Tvec-Tvec[1];
  #  Number of time series transitions, q.
  q <- length(Yobs)-1;
  #  q+1 gives the length of the time series.
  qp1 <- q+1;
  #  Time intervals
  S.t <- T.t[2:qp1]-T.t[1:q]; 
  #  Mean of the observations
  Ybar <- mean(Yobs);        
  # Variance of the observations
  Yvar <- sum((Yobs-Ybar)*(Yobs-Ybar))/q;
  # Initial mu estimate (at stationary distribution)
  mu1 <- Ybar;
  
  # Kludge an initial value for theta based on mean of Y(t+s) given Y(t).
  th1 <- -mean(log(abs((Yobs[2:qp1]-mu1)/(Yobs[1:q]-mu1)))/S.t);            
  # Moment estimate using stationary
  bsq1 <- 2*th1*Yvar/(1+2*th1);         
  # Observation error variance, assumed as first guess as betasq=tausq.
  tsq1 <- bsq1;                         
  
  # What to do if initial guesses is three 0's (or NAs)? Assume arbitrary values
  three0s <- sum(c(th1,bsq1,tsq1))
  if(three0s==0|is.na(three0s)){
    th1 <- 0.5;
    bsq1 <- 0.09; 
    tsq1 <- 0.23;
    }
  
  out1 <- c(th1,bsq1,tsq1);
  
  # What to do if initial guesses are too little? Assume arbitrary values
  if(sum(out1<1e-7)>=1){
    out1 <- c(0.5,0.09,0.23)
  }
  
  out <- c(mu1,out1);
  return(abs(out))
}

To take into account possible NAs in the counts, we can wrap these initial guess estimator function guess.calc() into another function guess.calc2.0():

guess.calc2.0<- function(TimeAndNs){
  
  newmat <- TimeAndNs 
  isnas <- sum(is.na(TimeAndNs))
  
  if(isnas >= 1){
    
    isnaind <- which(is.na(TimeAndNs[,2]), arr.ind=TRUE)
    newmat <- TimeAndNs[-isnaind,]
    newmat[,1] <- newmat[,1] - newmat[1,1]
    
  }
  
  init.guess <- guess.calc(Yobs = log(newmat[,2]), Tvec=newmat[,1])
  
  mu1  <- init.guess[1]
  th1  <- init.guess[2]
  bsq1 <- init.guess[3]
  sigsq1<- ((1-exp(-2*th1))*bsq1)/(2*th1)
  
  out <- c(mu=mu1, theta=th1, sigmasq = sigsq1)
  return(out)
}

We also need a Multivariate Normal random generator function randmvn(), which needs:

The number of random samples of a MVN vector: n
The mean vector of the MVN distribution to sample from: mu.vec
The variance-covariance matrix of the MVN distribution to sample from: cov.mat

randmvn <- function(n, mu.vec, cov.mat){

  # Save the length of the mean vector of the multivariate normal distribution to sample
  p         <- length(mu.vec);
  # The Cholesky decomposition
    #(factorization of a real symmetric positive-definite sqr matrix)
  Tau       <- chol(cov.mat, pivot=TRUE);
  # generate normal deviates outside loop
  Zmat      <- matrix(rnorm(n=p*n,mean=0,sd=1),nrow=p,ncol=n);

  # empty matrix
  out       <- matrix(0,nrow=p,ncol=n);
  # iterate
  for(i in 1:n){
    Z       <- Zmat[,i];
    out[,i] <- t(Tau)%*%Z + mu.vec
  }

  return(out)
}

3 Assymetric migration effects model

3.1 Bringing back the data processed

We can bring back the data processed, which include three surveys (4, 5, 6) per year for six regions across 29 years. However, the current year is incomplete, having only NAs, thus we removed and ended with three surveys for 28 years

load("Code/SNKIdata.RData") # Load raw data

S4mat <- Counts.list$S4mat[1:6,1:28] # Cut the last time step: no data anywhere
S5mat <- Counts.list$S5mat[1:6,1:28]
S6mat <- Counts.list$S6mat[1:6,1:28]

NA4mat <- NAS.list$NA4mat[1:6,1:28]
NA5mat <- NAS.list$NA5mat[1:6,1:28]
NA6mat <- NAS.list$NA6mat[1:6,1:28]

Now we pick a single time series of counts with no NAs, and “cbind() it” to a column of time, thus getting a matrix with time as a first column and counts as a second column. We use this matrix to compute a first guess of the model parameters to be used in all regions.

ts.4guess  <- S4mat[3,]
tvec4guess  <- as.numeric(colnames(S4mat))
onets4guess <- cbind(tvec4guess, ts.4guess) # No NAS for KRV # log(ts.4guess) changed to ts.4guess
naive.guess <- guess.calc2.0(TimeAndNs = onets4guess)
naive.guess

##         mu      theta    sigmasq 
## 4.83610707 0.06804601 0.11070265

The object naive.guess include the mean log abundance ($\mu$), the speed of equilibration ($\theta$), and the stochastic environmental variation ($\sigma^2$).

Now, we can bring the distance between regions, using the centroid and distance from the edge of a convex polygon of the historical geolocation of nests in each region. The data was provided in meters, but it could be convenient numerically to convert it to kilometers. In addition, we assign the row names that matches with our survey matrices.

# mean distance from the centroid 
centr.dists <- read.csv("Code/Demes_centroid distance in meters.csv")[,-1]
centr.dists <- as.matrix(centr.dists)/1000
rownames(centr.dists) <- rownames(S4mat)
centr.dists

##         EAST    EVER     KRV    OKEE      PP     SJM
## EAST   0.000 107.136 140.929  53.358 346.350  90.316
## EVER 107.136   0.000 209.721  98.977 421.761 185.599
## KRV  140.929 209.721   0.000 112.526 212.353  70.586
## OKEE  53.358  98.977 112.526   0.000 324.806  90.723
## PP   346.350 421.761 212.353 324.806   0.000 258.350
## SJM   90.316 185.599  70.586  90.723 258.350   0.000

# minimal distance between edges of the convex polygon of nests
min.condists <- read.csv("Code/Demes_minimum distance in meters.csv")[,-1]
min.condists <- as.matrix(min.condists)/1000
rownames(min.condists) <- rownames(S4mat)
min.condists

##         EAST    EVER     KRV    OKEE      PP     SJM
## EAST   0.000  13.771  43.385  15.924 298.085  35.273
## EVER  13.771   0.000  95.826  20.378 369.028 112.259
## KRV   43.385  95.826   0.000  25.243 163.055  23.540
## OKEE  15.924  20.378  25.243   0.000 297.987  48.160
## PP   298.085 369.028 163.055 297.987   0.000 241.972
## SJM   35.273 112.259  23.540  48.160 241.972   0.000

The number of (sub)populations p is defined and used to construct the initial values for the model. These initial values are a list of a vector at the log scale. This is a trick to have the priors in the positive domain of the lognormal distribution.

p <- nrow(S4mat)
myinits <- list(lmuvec = log(rep(naive.guess[1],p)), # mu ≈ mean(log(ts.4guess)),
                lthetavec = log(rep(naive.guess[2],p)), # why theta in log?
                lsigmasq = log(naive.guess[3])) # why sigmasq in log?

We next define the “dinvmat” choosing one of the distance matrices. This is the inverse matrix of distances, because in our model, the effect of one region on the growth rate of another region will be made inversely proportional to their mutual geographical distance.

dmat <- centr.dists 
dinvmat <- (dmat/dmat^2)  
diag(dinvmat) <- rep(0,p)
dinvmat

##             EAST        EVER         KRV        OKEE          PP         SJM
## EAST 0.000000000 0.009333931 0.007095772 0.018741332 0.002887253 0.011072235
## EVER 0.009333931 0.000000000 0.004768240 0.010103357 0.002371011 0.005387960
## KRV  0.007095772 0.004768240 0.000000000 0.008886835 0.004709140 0.014167115
## OKEE 0.018741332 0.010103357 0.008886835 0.000000000 0.003078761 0.011022563
## PP   0.002887253 0.002371011 0.004709140 0.003078761 0.000000000 0.003870718
## SJM  0.011072235 0.005387960 0.014167115 0.011022563 0.003870718 0.000000000

3.2 Model and data cloning ML fit:

Above, we mentioned that we would incorporate the effect of the populatin $j$ on the growth rate of the population $i$ as $b_{ij} = \frac{m}{d_{ij}}, i\neq j$, where $d_{ij}$ is the distance between these two regions. We also mentioned that $m$ could be interpreted as the migration probability out of a region, if that region is only 1 spatial unit away from another region. This model formulation can be problematic or unrealistic because it automatically sets symmetric effects on the growth rate of one deme on the other. One alternative is to specify as many $b_{ij}$ parameters as $p(p-1)$ to take in consideration asymmetric effects of migration. Another option that avoids the specification of $p(p-1)$ extra parameters but still allows for asymmetric effects is to shape the $\mathcal{B}$ matrix as a stepping-stone model where two different $m$ values are specified in the following way. For every deme, there is one other deme that is the closest to it. Let the closest deme to the $i^{\text{th}}$ deme be labelled with the letter $j$. Let all other demes be labeled with the letter $k$. Then, specify the $b_{ij}$ element of the $i^{\text{th}}$ row of the matrix $\mathcal{B}$ as $\frac{m_\text{close}}{d_{ij}}$ and the five other elements $b_{ik}$ in that row as $\frac{m_\text{far}}{d_{ik}}$. Then, under this model, the growth rate of every deme becomes the recipient of two different types of effect: the effect of the closest deme to it and the effect of all the other demes to it. This model embodies the idea of a stepping-stone or nearest neighbor effects. The model can be fitted via Maximum Likelihood and the Data Cloning algorithm to retrieve the MLEs (Lele et al 2007, Ponciano et al 2009, Lele et al 2010, Ponciano et al 2012). This algorithm basically uses Bayesian statistics and the statistical theory from Lele et al 2010 to retrieve the ML estimates. Sampling from the target cloned posterior distribution is achieved through MCMC. Using MCMC algorithms (through JAGS), we can compute the Maximum Likelihood Estimators (MLEs) for the parameters (see Lele et al. 2007, Ponciano et al. 2009 , Lele et al. 2010, Sólymos 2010, Ponciano et al. 2012). The process of estimation starts by specifying the model for JAGS:

MGSS_DC3.0 <- function(){
  ##### Priors (no cloning)
  lmuvec    ~ dmnorm(mupmean, muprec)
  lthetavec ~ dmnorm(lthetamean, lthetaprec)
  lsigmasq ~ dnorm(0, 1)
  #mig ~ dbeta(2,2)
  mig.near ~ dnorm(0,1)
  mig.far ~ dnorm(0,1)
  
  sigmasq <- exp(lsigmasq)  
  for(pp in 1:p){
    muvec[pp] <- exp(lmuvec[pp])
    thetavec[pp] <- exp(lthetavec[pp])
    cvec[pp] <- exp(-thetavec[pp])
  }
  
  Sigma <- (1/sigmasq)*Ip
  
  for(m in 1:p){
    for(n in 1:p){
      B[m,n] <- ifelse((m==n),cvec[m],ifelse(
        (m==step.stone[m,1])&&(n==step.stone[m,2]), 
        dinvmat[m,n]*mig.near,dinvmat[m,n]*mig.far)
      )
    }
  }
  
  A[1:p]  <- (Ip - B[1:p,1:p])%*%muvec
  
  for(k in 1:K){
    ##### ---- Likelihood (data cloning) ---- #####
    X[1:p,1,k] ~ dmnorm(muvec, Sigma) #muvec + beta*water[,1]
    for(h in 1:p){
      N4[h,1,k] ~ dpois(exp(X[h,1,k]))
      N5[h,1,k] ~ dpois(exp(X[h,1,k]))      
      N6[h,1,k] ~ dpois(exp(X[h,1,k]))
    }
    for( i in 2:qp1){ 
      ## Process error
      X[1:p,i,k]  ~ dmnorm(A[1:p] + B[1:p,1:p]%*%X[1:p,(i-1),k], Sigma); #A[1:p] + beta*water[,1] + B[...]
      for(j in 1:p){
        N4[j,i,k]  ~ dpois(exp(X[j,i,k])) ## Observation error
        N5[j,i,k]  ~ dpois(exp(X[j,i,k])) ## Observation error        
        N6[j,i,k]  ~ dpois(exp(X[j,i,k])) ## Observation error        
      }
    }
  }  
}

We next bundle data for Data cloning fitting into a list. This data list includes a dimension of the clones $K = 1$ that will be used to replicate the model likelihood subsection. It also includes the three replicates per year (S4mat, S5mat, S6mat), but formatted as an array of 6 regions p x 28 surveys x a 3rd dimension for the clones. The length of the time series (qp1) and the number of regions or (sub)populations (p) are used for the iterative process within the model. We provide a vector for the priors of the mean log-abundance mupmean and $\log{\theta}$ mean lthetamean, with their corresponding variance matrices muprec and lthetaprec (in precision terms for JAGS). The identity matrix Ip brings the mathematical form of

\[ \mathbf{I} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}, \] while the distance matrix dinvmat. In the model, we have included a step-stone process of movement, represented in the matrix step.stone. This is a matrix with 6 rows (one per deme) and two columns. In these two columns, the i$^\text{th}$ row gives the position (row and column number) in the distance matrix of the closest deme to that i$^\text{th}$ deme.

data4dclone <- list(K=1, 
                    N4=dcdim(array(S4mat,dim=c(p,ncol(S4mat),1))), 
                    N5=dcdim(array(S5mat,dim=c(p,ncol(S5mat),1))), 
                    N6=dcdim(array(S6mat,dim=c(p,ncol(S6mat),1))), 
                    qp1=ncol(S4mat), 
                    p=p, 
                    mupmean=rep(6,p), 
                    muprec=(1/3)*diag(p),
                    lthetamean = rep(0.5,p), 
                    lthetaprec =  10*diag(p),
                    Ip =  diag(p),
                    dinvmat = dinvmat,
                    step.stone = matrix(c(1,4,2,4,3,5,4,1,5,3,6,3),
                                        nrow=6,ncol=2,byrow=TRUE)
)

Finally, we set up the number of clones, the MCMC chains, which parameters to sample, and fit the data cloning model (it takes some time, so it is in eval = FALSE at this moment).

cl.seq <- c(1,8,16);
n.iter<-100000;n.adapt<-50000;n.update<-100;thin<-10;n.chains<-3;

out.parms <- c("muvec", "cvec", "mig.near", "mig.far", "sigmasq", "A")
mgssdclone <- dc.fit(data4dclone, 
                     params=out.parms, 
                     model=MGSS_DC3.0, 
                     n.clones=cl.seq,
                     multiply="K", 
                     unchanged = c("qp1","p","mupmean","muprec", 
                      "step.stone", "lthetamean","lthetaprec", "Ip","dinvmat"),
                     n.chains = n.chains, 
                     n.adapt=n.adapt, 
                     n.update=n.update,
                     n.iter = n.iter, 
                     thin=thin, 
                     inits=myinits) 
saveRDS(mgssdclone, "Code/mgssdclone.RDS")

Let’s check this model. We can see the diagnostics for estimability of the parameters with the dcdiag() function in the package dclone, as well as the summary and confidence intervals of the estimates. The variable lambda.max is the largest eigenvalue of the posterior variance covariance matrix, while ms.error represents the mean squared error, and the r.squared is a correlation-like fit statistic. All three statistics should decrease (converge to zero) as a function of the number of clones. Furthermore, the multivariate r.hat value is used to check MCMC chain convergence (should be near to 1).

mgssdclone <- readRDS("Code/mgssdclone.RDS")
dcdiag(mgssdclone)

summary(mgssdclone)

## 
## Iterations = 50110:150100
## Thinning interval = 10 
## Number of chains = 3 
## Sample size per chain = 10000 
## Number of clones = 16
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##             Mean      SD  DC SD  Naive SE Time-series SE R hat
## A[1]      1.5686 0.15669 0.6268 0.0009047      0.0013335 1.001
## A[2]      1.0188 0.14484 0.5794 0.0008362      0.0013103 1.000
## A[3]      2.1718 0.29649 1.1860 0.0017118      0.0031254 1.000
## A[4]      1.4054 0.14942 0.5977 0.0008627      0.0012378 1.000
## A[5]      3.7636 0.36057 1.4423 0.0020817      0.0055731 1.000
## A[6]      2.1968 0.16387 0.6555 0.0009461      0.0014684 1.000
## cvec[1]   0.4525 0.04321 0.1728 0.0002495      0.0004223 1.001
## cvec[2]   0.7569 0.03275 0.1310 0.0001891      0.0003312 1.000
## cvec[3]   0.5339 0.05558 0.2223 0.0003209      0.0005731 1.000
## cvec[4]   0.6150 0.02839 0.1135 0.0001639      0.0002834 1.000
## cvec[5]   0.1594 0.07582 0.3033 0.0004378      0.0011053 1.000
## cvec[6]   0.3459 0.04088 0.1635 0.0002360      0.0004005 1.000
## mig.far   0.8223 0.74204 2.9682 0.0042842      0.0046193 1.000
## mig.near -2.4412 0.77140 3.0856 0.0044537      0.0045958 1.000
## muvec[1]  2.7442 0.08186 0.3274 0.0004726      0.0009310 1.001
## muvec[2]  4.0877 0.15202 0.6081 0.0008777      0.0017639 1.001
## muvec[3]  4.7521 0.10000 0.4000 0.0005774      0.0011321 1.000
## muvec[4]  3.6089 0.10974 0.4390 0.0006336      0.0012555 1.002
## muvec[5]  4.4525 0.13459 0.5384 0.0007771      0.0034052 1.001
## muvec[6]  3.2449 0.06856 0.2743 0.0003958      0.0007205 1.000
## sigmasq   0.9148 0.03081 0.1232 0.0001779      0.0002367 1.000
## 
## 2. Quantiles for each variable:
## 
##              2.5%     25%     50%     75%   97.5%
## A[1]      1.26041  1.4626  1.5687  1.6733  1.8770
## A[2]      0.74201  0.9197  1.0152  1.1150  1.3101
## A[3]      1.59353  1.9708  2.1709  2.3734  2.7508
## A[4]      1.11696  1.3039  1.4052  1.5050  1.7024
## A[5]      2.97563  3.5354  3.7943  4.0278  4.3734
## A[6]      1.88045  2.0861  2.1960  2.3055  2.5224
## cvec[1]   0.36773  0.4234  0.4524  0.4817  0.5366
## cvec[2]   0.69149  0.7352  0.7577  0.7796  0.8193
## cvec[3]   0.42418  0.4963  0.5348  0.5717  0.6416
## cvec[4]   0.55864  0.5958  0.6154  0.6343  0.6691
## cvec[5]   0.03793  0.1025  0.1513  0.2068  0.3278
## cvec[6]   0.26411  0.3185  0.3463  0.3737  0.4247
## mig.far  -0.65189  0.3199  0.8225  1.3240  2.2690
## mig.near -3.95249 -2.9627 -2.4400 -1.9199 -0.9162
## muvec[1]  2.58363  2.6899  2.7448  2.7988  2.9039
## muvec[2]  3.79614  3.9842  4.0861  4.1877  4.3908
## muvec[3]  4.54119  4.6889  4.7565  4.8204  4.9362
## muvec[4]  3.39379  3.5364  3.6081  3.6818  3.8261
## muvec[5]  4.18583  4.3625  4.4526  4.5437  4.7146
## muvec[6]  3.10992  3.1991  3.2447  3.2908  3.3777
## sigmasq   0.85577  0.8936  0.9144  0.9351  0.9770

confint(mgssdclone)

##                2.5 %    97.5 %
## A[1]      0.34015252 2.7970615
## A[2]     -0.11672692 2.1543309
## A[3]     -0.15260838 4.4962596
## A[4]      0.23391629 2.5767935
## A[5]      0.93686343 6.5904314
## A[6]      0.91207482 3.4815045
## cvec[1]   0.11376591 0.7912883
## cvec[2]   0.50018275 1.0137029
## cvec[3]   0.09817779 0.9696881
## cvec[4]   0.39243775 0.8375191
## cvec[5]  -0.43500874 0.7538486
## cvec[6]   0.02537950 0.6664430
## mig.far  -4.99518971 6.6397722
## mig.near -8.48887844 3.6064220
## muvec[1]  2.10243232 3.3860075
## muvec[2]  2.89592793 5.2795692
## muvec[3]  3.96810422 5.5361400
## muvec[4]  2.74855533 4.4692142
## muvec[5]  3.39732704 5.5077066
## muvec[6]  2.70736938 3.7824204
## sigmasq   0.67325339 1.1563456

We can save the MLEs and predict the dynamic of the population for each region with the Kalman filter estimates

mlescentrdist <- summary(mgssdclone)[[1]]
mles4kalman <- mlescentrdist[,1]
mles4kalman

##       A[1]       A[2]       A[3]       A[4]       A[5]       A[6]    cvec[1] 
##  1.5686070  1.0188020  2.1718256  1.4053549  3.7636474  2.1967897  0.4525271 
##    cvec[2]    cvec[3]    cvec[4]    cvec[5]    cvec[6]    mig.far   mig.near 
##  0.7569428  0.5339329  0.6149784  0.1594199  0.3459112  0.8222913 -2.4412282 
##   muvec[1]   muvec[2]   muvec[3]   muvec[4]   muvec[5]   muvec[6]    sigmasq 
##  2.7442199  4.0877486  4.7521221  3.6088848  4.4525168  3.2448949  0.9147995

avec4kalman <- mles4kalman[1:6]
cvec4kalman <- mles4kalman[7:12]
mig.far4kalman  <- mles4kalman[13]
mig.near4kalman <- mles4kalman[14]
muvec4kalman <- mles4kalman[15:20]
sigsq4kalman <- mles4kalman[21]

3.3 Kalman estimates of abundance with effect of assymetric migration

MGSS_KALMAN2.0 <- function(){
  
  A  <- (Ip - B)%*%muvec
  Sigma <- (1/sigmasq)*Ip
  
  ##### ---- Likelihood ---- #####
  X[1:p,1] ~ dmnorm(muvec, Sigma)
  for(h in 1:p){
    N4[h,1] ~ dpois(exp(X[h,1]))
    N5[h,1] ~ dpois(exp(X[h,1]))    
    N6[h,1] ~ dpois(exp(X[h,1]))    
  }
  for( i in 2:qp1){ 
    ## Process error
    X[1:p,i]  ~ dmnorm(A + B%*%X[1:p,(i-1)], Sigma); 
    for(j in 1:p){
      N4[j,i]  ~ dpois(exp(X[j,i])) ## Observation error
      N5[j,i]  ~ dpois(exp(X[j,i])) ## Observation error      
      N6[j,i]  ~ dpois(exp(X[j,i])) ## Observation error
    }
  }
}

Data for Kalman estimates

step.stone <- matrix(c(1,4,2,4,3,5,4,1,5,3,6,3),
                     nrow=6,
                     ncol=2,
                     byrow=TRUE)
step.stone

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    4
## [3,]    3    5
## [4,]    4    1
## [5,]    5    3
## [6,]    6    3

B <- matrix(0,p,p)

for(m in 1:p){
  for(n in 1:p){
    B[m,n] <- ifelse((m==n),cvec4kalman[m],ifelse(
      (m==step.stone[m,1])&&(n==step.stone[m,2]), 
      dinvmat[m,n]*mig.near4kalman,dinvmat[m,n]*mig.far4kalman)
    ) #
    
  }
}

print(B)

##              [,1]        [,2]         [,3]         [,4]         [,5]
## [1,]  0.452527111 0.007675210  0.005834791 -0.045751869  0.002374163
## [2,]  0.007675210 0.756942831  0.003920882 -0.024664601  0.001949662
## [3,]  0.005834791 0.003920882  0.533932933  0.007307567 -0.011496085
## [4,] -0.045751869 0.008307902  0.007307567  0.614978429  0.002531638
## [5,]  0.002374163 0.001949662 -0.011496085  0.002531638  0.159419920
## [6,]  0.009104602 0.004430472 -0.034585162  0.009063757  0.003182858
##             [,6]
## [1,] 0.009104602
## [2,] 0.004430472
## [3,] 0.011649495
## [4,] 0.009063757
## [5,] 0.003182858
## [6,] 0.345911246

data4kalman <- list(N4=S4mat,
                    N5=S5mat,
                    N6=S6mat,
                    qp1=ncol(S4mat), 
                    p=p,
                    Ip =  diag(p),
                    muvec = muvec4kalman,
                    sigmasq = sigsq4kalman, 
                    B=B
)
str(data4kalman)

## List of 9
##  $ N4     : int [1:6, 1:28] 3 106 27 32 NA 34 5 229 11 28 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:6] "EAST" "EVER" "KRV" "OKEE" ...
##   .. ..$ : chr [1:28] "0" "1" "2" "3" ...
##  $ N5     : int [1:6, 1:28] 14 240 36 88 NA 29 4 347 40 50 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:6] "EAST" "EVER" "KRV" "OKEE" ...
##   .. ..$ : chr [1:28] "0" "1" "2" "3" ...
##  $ N6     : int [1:6, 1:28] 7 131 28 33 NA 32 NA 145 70 26 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:6] "EAST" "EVER" "KRV" "OKEE" ...
##   .. ..$ : chr [1:28] "0" "1" "2" "3" ...
##  $ qp1    : int 28
##  $ p      : int 6
##  $ Ip     : num [1:6, 1:6] 1 0 0 0 0 0 0 1 0 0 ...
##  $ muvec  : Named num [1:6] 2.74 4.09 4.75 3.61 4.45 ...
##   ..- attr(*, "names")= chr [1:6] "muvec[1]" "muvec[2]" "muvec[3]" "muvec[4]" ...
##  $ sigmasq: Named num 0.915
##   ..- attr(*, "names")= chr "sigmasq"
##  $ B      : num [1:6, 1:6] 0.45253 0.00768 0.00583 -0.04575 0.00237 ...

n.iter<-100000
n.adapt<-1000
n.update<-100
thin<-10
n.chains<-3

out.parms <- c("X")
kalman.mcmcw <- jags.fit(data4kalman, 
                         params=out.parms, 
                         model=MGSS_KALMAN2.0, 
                         n.chains = n.chains, 
                         n.adapt=n.adapt, 
                         n.update=n.update, 
                         n.iter = n.iter, 
                         thin=thin)

## Registered S3 method overwritten by 'R2WinBUGS':
##   method            from  
##   as.mcmc.list.bugs dclone

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 395
##    Unobserved stochastic nodes: 137
##    Total graph size: 1011
## 
## Initializing model

And then, bring back the data to see the population trajectories in each region.

Xkalman <-  do.call(rbind, kalman.mcmcw)

Kalman.quants <- apply(Xkalman, 2, FUN=function(x){quantile(x, probs=c(0.025,0.5,0.975))})

Here, we can plot the Kalman estimates, with CI. With classic R code format

len <- ncol(S4mat)

names.quants <- substr(colnames(Kalman.quants), start=1, stop=3)
X1s <- Kalman.quants[,which(names.quants=="X[1", arr.ind=TRUE)]
X2s <- Kalman.quants[,which(names.quants=="X[2", arr.ind=TRUE)]
X3s <- Kalman.quants[,which(names.quants=="X[3", arr.ind=TRUE)]
X4s <- Kalman.quants[,which(names.quants=="X[4", arr.ind=TRUE)]
X5s <- Kalman.quants[,which(names.quants=="X[5", arr.ind=TRUE)]
X6s <- Kalman.quants[,which(names.quants=="X[6", arr.ind=TRUE)]

par(oma=c(1,1,1,1), mar=c(4,5,2,1))
plot(1:len, X1s[2,], pch=16, type="b", col="red",
     ylim=c(min(Kalman.quants),max(Kalman.quants)), 
     main ="SNKI - six regions", 
     xlab="Time", 
     ylab="log-population sizes", bty="l")
polygon(c(1:len,rev(1:len)), c(X1s[1,],rev(X1s[3,])), 
        col=scales::alpha("red",0.1), border=NA)
points(1:len, X1s[2,],pch=16, type="b", col="red")

polygon(c(1:len,rev(1:len)), c(X2s[1,],rev(X2s[3,])), 
        col=scales::alpha("blue",0.1), border=NA)
points(1:len, X2s[2,],pch=16, type="b", col="blue")

polygon(c(1:len,rev(1:len)), c(X3s[1,],rev(X3s[3,])), 
        col=scales::alpha("darkgreen",0.1), border=NA)
points(1:len, X3s[2,],pch=16, type="b", col="darkgreen")

polygon(c(1:len,rev(1:len)), c(X4s[1,],rev(X4s[3,])), 
        col=scales::alpha("yellow",0.1), border=NA)
points(1:len, X4s[2,],pch=16, type="b", col="yellow")

polygon(c(1:len,rev(1:len)), c(X5s[1,],rev(X5s[3,])), 
        col=scales::alpha("purple",0.1), border=NA)
points(1:len, X5s[2,],pch=16, type="b", col="purple")

polygon(c(1:len,rev(1:len)), c(X6s[1,],rev(X6s[3,])), 
        col=scales::alpha("orange",0.1), border=NA)
points(1:len, X6s[2,],pch=16, type="b", col="orange")

Or also with tidyverse, converting the matrix form to a dataframe and extracting the values

kalman_df <- as.data.frame(Kalman.quants) |>
  mutate(quantile = rownames(Kalman.quants)) |>
  pivot_longer(-quantile, 
               names_to = "state_time", 
               values_to = "value") |>
  extract(state_time, 
          into = c("region", "time"), 
          regex = "X\\[(\\d+),(\\d+)\\]", convert = TRUE)

And then plotting with the names of the regions and in real scale of counts, adding the three surveys per year

kalman_wide <- kalman_df |>
  mutate(value = exp(value),
         region = ifelse(region == 1, "EAST",
                  ifelse(region == 2, "EVER",
                  ifelse(region == 3, "KRV",
                  ifelse(region == 4, "OKEE",
                  ifelse(region == 5, "PP",
                  ifelse(region == 6, "SJM",
                  region))))))) |>
  pivot_wider(names_from = quantile, 
              values_from = value)

#counts
S4df <- as.data.frame(S4mat) |>
  mutate(region = rownames(S4mat)) |>
  pivot_longer(-region, 
               names_to = "time", 
               values_to = "Count4") 
S5df <- as.data.frame(S5mat) |>
  mutate(region = rownames(S5mat)) |>
  pivot_longer(-region, 
               names_to = "time", 
               values_to = "Count5") 
S6df <- as.data.frame(S6mat) |>
  mutate(region = rownames(S6mat)) |>
  pivot_longer(-region, 
               names_to = "time", 
               values_to = "Count6") 

Countsdf <- S4df |>
  left_join(S5df) |>
  left_join(S6df) |>
  mutate(time = as.numeric(time) + 1)

## Joining with `by = join_by(region, time)`
## Joining with `by = join_by(region, time)`

kalman_wide <- kalman_wide |> 
  left_join(Countsdf) |>
  mutate(time = time + 1996)

## Joining with `by = join_by(region, time)`

kalman_wide$region <- factor(kalman_wide$region,
                             levels = c("EVER",
                                        "EAST",
                                        "OKEE",
                                        "KRV",
                                        "SJM",
                                        "PP"))

# Plot using ggplot2
ggplot(kalman_wide, aes(x = time, y = `50%`, 
                        color = factor(region), 
                        fill = factor(region))) +
  facet_wrap(~region, scales = "free_y", ncol = 2) +
  geom_line() +
  geom_ribbon(aes(ymin = `2.5%`, ymax = `97.5%`), 
              alpha = 0.2, color = NA) +
  geom_point(aes(y = Count4), shape = 25, fill = NA)+
  geom_point(aes(y = Count5), shape = 21, fill = NA)+
  geom_point(aes(y = Count6), shape = 24, fill = NA)+
  labs(
    title = "SNKI - Six Regions",
    x = "Year",
    y = "Population sizes",
    color = "Region",
    fill = "Region"
    ) +
  scale_color_manual(values=met.brewer("Homer2", 6)) +
  scale_fill_manual(values=met.brewer("Homer2", 6)) +
  theme_minimal() +
  theme(legend.position = "none")

In this new plot, the open points represent three surveys per year (replicates): down-facing triangles are survey 4, circles are survey 5, and up-facing triangles are survey 6.

4 Population viability analysis: short, medium and long term projections per deme

First we build a time series simulator, starting from a given initial value. The function is primed with the estimated true, latent final log-abundances at each deme and the ML estimates of the model parameters.

MVX.sim <- function(xo.vec = c(X1s[2,28],X2s[2,28],X3s[2,28],X4s[2,28],X5s[2,28],X6s[2,28]), 
                    theta.list=
                    list(p=p,
                    Ip =  diag(p),
                    muvec = muvec4kalman,
                    sigmasq = sigsq4kalman, 
                    B=B),
                    nsteps=5,
                    rnames = row.names(S4mat)){

  p  <- theta.list$p
  Ip <- theta.list$Ip
  muvec <- theta.list$muvec
  sigmasq <- theta.list$sigmasq
  B <- theta.list$B
  
  A  <- (Ip - B)%*%muvec
  Sigma <- (1/sigmasq)*Ip
  
  X <- matrix(0,nrow=p,ncol=nsteps)
  X[1:p,1] <- randmvn(n=1,mu.vec=A+B%*%xo.vec, cov.mat=Sigma)
  
  for( i in 2:nsteps){ 
    ## Process error
    X[1:p,i] <- randmvn(n=1,mu.vec=A + B%*%X[1:p,(i-1)], cov.mat=Sigma)
  }
  
  X <- cbind(xo.vec,X)
  row.names(X) <- rnames
  colnames(X) <- 0:nsteps
  return(X)
}

Next, we use this simulator to simulate 100,000 trajectories of length 10 years and store these in a list. This list contains 6 matrices, one per deme. Each matrix has 100,000 rows and 11 columns (10 years plus the initial estimated abundances).

sims.list <- list()
nyears.fwd <- 10
nsims <- 100000

for(i in 1:p){
  sims.list[[i]] <- matrix(0,nrow=nsims,ncol=nyears.fwd+1)
}
names(sims.list) <- row.names(S4mat)

for(i in 1:nsims){
  
  ith.sim <- MVX.sim(nsteps=10)
  
  sims.list[[1]][i,] <- ith.sim[1,] 
  sims.list[[2]][i,] <- ith.sim[2,] 
  sims.list[[3]][i,] <- ith.sim[3,] 
  sims.list[[4]][i,] <- ith.sim[4,] 
  sims.list[[5]][i,] <- ith.sim[5,] 
  sims.list[[6]][i,] <- ith.sim[6,] 
}

Now we compute the year-wise cumulative probability of dropping below an arbitrarily low critical number. Starting at the current population size, this number tracks the probability that the population will drop below a critical number anytime between the present time and every subsequent year, for as many years we simulated the population forward in time. Here we simulated only 10 years forward in time thus sticking with short-term forecasting. The critical abundance used in these simulations was conservatively set as 75% the estimated mean of the stationary distribution for every deme. This quantity then becomes our standard population viability threshold for every deme in our simulations for this first part of the report.

The vector of estimated means of the stationary distribution in the log-scale is:

ks <- exp(muvec4kalman)
names(ks) <- row.names(S4mat)
print(ks)

##      EAST      EVER       KRV      OKEE        PP       SJM 
##  15.55248  59.60554 115.82983  36.92485  85.84272  25.65901

In the population size scale however, the mean of the abundances is given by the expected value of the log-normal statioinary distribution for every mean. This expected value is computed easily since with the model ML estmates we can in turn estimate of the stationary variance for every deme using equation 17 in Ives et al 2003:

Ip <- diag(p)
Sigma <- (1/sigsq4kalman)*Ip

Ipsq <- diag(p*p)
vec.Vinf <- solve(Ipsq - kronecker(B,B))%*%as.vector(Sigma)
Vinf <- matrix(vec.Vinf, nrow=p,ncol=p)
Vars.inf <- diag(Vinf)
names(Vars.inf) <- row.names(S4mat)

E.Ns <- exp(muvec4kalman + Vars.inf/2)
names(E.Ns) <- row.names(S4mat)
print(E.Ns)

##      EAST      EVER       KRV      OKEE        PP       SJM 
##  31.09756 215.05530 248.89954  89.61099 150.42888  47.82010

ncrit <- E.Ns*0.75 # change here to a desired number
lcrit <- log(ncrit)
nsteps <- 10
probs.drop <- matrix(0, nrow=6,ncol=nsteps)
rownames(probs.drop) <- row.names(S4mat)
for(i in 1:6){
  
  simsmat <- sims.list[[i]] # drop the initial log pop size
  
  for(j in 2:(nsteps+1)){
    
    jthsubmat <- simsmat[,1:j]
    
    probs.drop[i,(j-1)] <- sum(jthsubmat<lcrit[i])/(nrow(jthsubmat)*ncol(jthsubmat))
  }
}

print(probs.drop)

##          [,1]      [,2]      [,3]     [,4]      [,5]      [,6]      [,7]
## EAST 0.279355 0.3944633 0.4560525 0.494324 0.5200733 0.5376886 0.5507775
## EVER 0.999995 0.9989100 0.9942750 0.985558 0.9737933 0.9607000 0.9470550
## KRV  0.813315 0.7544533 0.7271125 0.711670 0.7015500 0.6947814 0.6894050
## OKEE 0.183855 0.2958200 0.3688275 0.419782 0.4576417 0.4866314 0.5091463
## PP   0.800435 0.7354600 0.7030675 0.683728 0.6702850 0.6606786 0.6535025
## SJM  0.283950 0.3907233 0.4473275 0.481934 0.5049433 0.5216014 0.5339625
##           [,8]     [,9]     [,10]
## EAST 0.5604100 0.567987 0.5743945
## EVER 0.9336222 0.920913 0.9087436
## KRV  0.6855322 0.681963 0.6789473
## OKEE 0.5271144 0.541752 0.5535655
## PP   0.6477022 0.643028 0.6393100
## SJM  0.5434378 0.551281 0.5572036

# Now we define a color palette for plotting:
cool.palette <- as.vector(met.brewer("Homer2",6))

cool.palette <- as.vector(met.brewer("Homer2",6))

mat <-  matrix(c(rep(c(1,1,1,1),3),rep(2,4)),nrow=4,ncol=4,byrow=T)
layout(mat)
par(oma=c(2,2,1,1), mar=c(4,4,4,0.5))
plot(1:nsteps,probs.drop[1,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="Years ahead of present", ylab="Pr. Falling Under Conservation Threshold", cex.lab=1.25)
for(i in 2:6){
  points(1:nsteps,probs.drop[i,], type="b",  lwd=2, col=cool.palette[i])
}
#par(oma=c(1,1,1,1), mar=c(4,4,4,1))
plot(0,0,type="n",axes=FALSE,xlab="",ylab="",main="")
legend(-0.8,1.65,legend=row.names(S4mat),lty=rep(1,6),lwd=rep(2,6),cex=1.15, col=cool.palette, bty="n",
       horiz=TRUE)

Benchmark probabilities of falling under conservation threshold

This figure shows that the trends from the last 28 years show that if the environmental scenario keeps fluctuating according to the same patterns for the next 10 years, the chances that the number of kites will drop below 100 in every deme increase over time, except for Payne’s Prairie, where those chances stabilize around 0.6. These trends represent the current baseline.

pct.red <- 0.9 # mu reduced by 10 %
newmus <- muvec4kalman*pct.red

sims.list10pct <- list()
nyears.fwd <- 10
nsims <- 100000

for(i in 1:p){
  sims.list10pct[[i]] <- matrix(0,nrow=nsims,ncol=nyears.fwd+1)
}
names(sims.list10pct) <- row.names(S4mat)

for(i in 1:nsims){
  
  ith.sim <- MVX.sim(xo.vec = c(X1s[2,28],X2s[2,28],X3s[2,28],X4s[2,28],X5s[2,28],X6s[2,28]), 
                    theta.list=
                    list(p=p,
                      Ip =  diag(p),
                      muvec = newmus,
                      sigmasq = sigsq4kalman, 
                      B=B),
                    nsteps=10,
                    rnames = row.names(S4mat))
  
  sims.list10pct[[1]][i,] <- ith.sim[1,] 
  sims.list10pct[[2]][i,] <- ith.sim[2,] 
  sims.list10pct[[3]][i,] <- ith.sim[3,] 
  sims.list10pct[[4]][i,] <- ith.sim[4,] 
  sims.list10pct[[5]][i,] <- ith.sim[5,] 
  sims.list10pct[[6]][i,] <- ith.sim[6,] 
}

ncrit <- E.Ns*0.75 # change here to a desired number
lcrit <- log(ncrit)
nsteps <- 10
probs.drop10pct <- matrix(0, nrow=6,ncol=nsteps)
rownames(probs.drop10pct) <- row.names(S4mat)
for(i in 1:6){
  
  simsmat <- sims.list10pct[[i]] # drop the initial log pop size
  
  for(j in 2:(nsteps+1)){
    
    jthsubmat <- simsmat[,1:j]
    
    probs.drop10pct[i,(j-1)] <- sum(jthsubmat<lcrit[i])/(nrow(jthsubmat)*ncol(jthsubmat))
  }
}

mat <-  matrix(c(rep(c(1,1,1,1),3),rep(2,4)),nrow=4,ncol=4,byrow=T)
layout(mat)
par(oma=c(2,2,1,1), mar=c(4,4,4,0.5))
plot(1:nsteps,probs.drop10pct[1,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="Years ahead of present", ylab="Pr. Falling Under Conservation Threshold", cex.lab=1.25)
for(i in 2:6){
  points(1:nsteps,probs.drop10pct[i,], type="b",  lwd=2, col=cool.palette[i])
}
#par(oma=c(1,1,1,1), mar=c(4,4,4,1))
plot(0,0,type="n",axes=FALSE,xlab="",ylab="",main="")
legend(-0.8,1.65,legend=row.names(S4mat),lty=rep(1,6),lwd=rep(2,6),cex=1.15, col=cool.palette, bty="n",
       horiz=TRUE)

save.image("All-up2-10pctReduction.RData")

Benchmark probabilities of falling under conservation threshold with a 10% reduction in the average growth rate

And with a 20% reduction in the average growth rate we get the following 10 years forecast in the probability of falling under the conservation threshold of 75% of the mean of the historical stationary distribution:

pct.red <- 0.8 # mu reduced by 10 %
newmus <- muvec4kalman*pct.red

sims.list20pct <- list()
nyears.fwd <- 10
nsims <- 100000

for(i in 1:p){
  sims.list20pct[[i]] <- matrix(0,nrow=nsims,ncol=nyears.fwd+1)
}
names(sims.list20pct) <- row.names(S4mat)

for(i in 1:nsims){
  
  ith.sim <- MVX.sim(xo.vec = c(X1s[2,28],X2s[2,28],X3s[2,28],X4s[2,28],X5s[2,28],X6s[2,28]), 
                    theta.list=
                    list(p=p,
                      Ip =  diag(p),
                      muvec = newmus,
                      sigmasq = sigsq4kalman, 
                      B=B),
                    nsteps=10,
                    rnames = row.names(S4mat))
  
  sims.list20pct[[1]][i,] <- ith.sim[1,] 
  sims.list20pct[[2]][i,] <- ith.sim[2,] 
  sims.list20pct[[3]][i,] <- ith.sim[3,] 
  sims.list20pct[[4]][i,] <- ith.sim[4,] 
  sims.list20pct[[5]][i,] <- ith.sim[5,] 
  sims.list20pct[[6]][i,] <- ith.sim[6,] 
}

ncrit <- E.Ns*0.75 # change here to a desired number
lcrit <- log(ncrit)
nsteps <- 10
probs.drop20pct <- matrix(0, nrow=6,ncol=nsteps)
rownames(probs.drop20pct) <- row.names(S4mat)
for(i in 1:6){
  
  simsmat <- sims.list20pct[[i]] # drop the initial log pop size
  
  for(j in 2:(nsteps+1)){
    
    jthsubmat <- simsmat[,1:j]
    
    probs.drop20pct[i,(j-1)] <- sum(jthsubmat<lcrit[i])/(nrow(jthsubmat)*ncol(jthsubmat))
  }
}

mat <-  matrix(c(rep(c(1,1,1,1),3),rep(2,4)),nrow=4,ncol=4,byrow=T)
layout(mat)
par(oma=c(2,2,1,1), mar=c(4,4,4,0.5))
plot(1:nsteps,probs.drop20pct[1,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="Years ahead of present", ylab="Pr. Falling Under Conservation Threshold", cex.lab=1.25)
for(i in 2:6){
  points(1:nsteps,probs.drop20pct[i,], type="b",  lwd=2, col=cool.palette[i])
}
#par(oma=c(1,1,1,1), mar=c(4,4,4,1))
plot(0,0,type="n",axes=FALSE,xlab="",ylab="",main="")
legend(-0.8,1.65,legend=row.names(S4mat),lty=rep(1,6),lwd=rep(2,6),cex=1.15, col=cool.palette, bty="n",
       horiz=TRUE)

save.image("All-up2-20pctReduction.RData")

Benchmark probabilities of falling under conservation threshold with a 20% reduction in the average growth rate

And with a 30% reduction in the average growth rate we get the following 10 years forecast in the probability of falling under the conservation threshold of 75% of the mean of the historical stationary distribution:

pct.red <- 0.70 # mu reduced by 10 %
newmus <- muvec4kalman*pct.red

sims.list30pct <- list()
nyears.fwd <- 10
nsims <- 100000

for(i in 1:p){
  sims.list30pct[[i]] <- matrix(0,nrow=nsims,ncol=nyears.fwd+1)
}
names(sims.list30pct) <- row.names(S4mat)

for(i in 1:nsims){
  
  ith.sim <- MVX.sim(xo.vec = c(X1s[2,28],X2s[2,28],X3s[2,28],X4s[2,28],X5s[2,28],X6s[2,28]), 
                    theta.list=
                    list(p=p,
                      Ip =  diag(p),
                      muvec = newmus,
                      sigmasq = sigsq4kalman, 
                      B=B),
                    nsteps=10,
                    rnames = row.names(S4mat))
  
  sims.list30pct[[1]][i,] <- ith.sim[1,] 
  sims.list30pct[[2]][i,] <- ith.sim[2,] 
  sims.list30pct[[3]][i,] <- ith.sim[3,] 
  sims.list30pct[[4]][i,] <- ith.sim[4,] 
  sims.list30pct[[5]][i,] <- ith.sim[5,] 
  sims.list30pct[[6]][i,] <- ith.sim[6,] 
}

ncrit <- E.Ns*0.75 # change here to a desired number
lcrit <- log(ncrit)
nsteps <- 10
probs.drop30pct <- matrix(0, nrow=6,ncol=nsteps)
rownames(probs.drop30pct) <- row.names(S4mat)
for(i in 1:6){
  
  simsmat <- sims.list30pct[[i]] # drop the initial log pop size
  
  for(j in 2:(nsteps+1)){
    
    jthsubmat <- simsmat[,1:j]
    
    probs.drop30pct[i,(j-1)] <- sum(jthsubmat<lcrit[i])/(nrow(jthsubmat)*ncol(jthsubmat))
  }
}

mat <-  matrix(c(rep(c(1,1,1,1),3),rep(2,4)),nrow=4,ncol=4,byrow=T)
layout(mat)
par(oma=c(2,2,1,1), mar=c(4,4,4,0.5))
plot(1:nsteps,probs.drop30pct[1,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="Years ahead of present", ylab="Pr. Falling Under Conservation Threshold", cex.lab=1.25)
for(i in 2:6){
  points(1:nsteps,probs.drop30pct[i,], type="b",  lwd=2, col=cool.palette[i])
}
#par(oma=c(1,1,1,1), mar=c(4,4,4,1))
plot(0,0,type="n",axes=FALSE,xlab="",ylab="",main="")
legend(-0.8,1.65,legend=row.names(S4mat),lty=rep(1,6),lwd=rep(2,6),cex=1.15, col=cool.palette, bty="n",
       horiz=TRUE)

save.image("All-up2-30pctReduction.RData")

Benchmark probabilities of falling under conservation threshold with a 20% reduction in the average growth rate

Now the same calculations but with a 40% reduction in the average growth rate we get the following 10 years forecast in the probability of falling under the conservation threshold of 75% of the mean of the historical stationary distribution:

pct.red <- 0.6 # mu reduced by 10 %
newmus <- muvec4kalman*pct.red

sims.list40pct <- list()
nyears.fwd <- 10
nsims <- 100000

for(i in 1:p){
  sims.list40pct[[i]] <- matrix(0,nrow=nsims,ncol=nyears.fwd+1)
}
names(sims.list40pct) <- row.names(S4mat)

for(i in 1:nsims){
  
  ith.sim <- MVX.sim(xo.vec = c(X1s[2,28],X2s[2,28],X3s[2,28],X4s[2,28],X5s[2,28],X6s[2,28]), 
                    theta.list=
                    list(p=p,
                      Ip =  diag(p),
                      muvec = newmus,
                      sigmasq = sigsq4kalman, 
                      B=B),
                    nsteps=10,
                    rnames = row.names(S4mat))
  
  sims.list40pct[[1]][i,] <- ith.sim[1,] 
  sims.list40pct[[2]][i,] <- ith.sim[2,] 
  sims.list40pct[[3]][i,] <- ith.sim[3,] 
  sims.list40pct[[4]][i,] <- ith.sim[4,] 
  sims.list40pct[[5]][i,] <- ith.sim[5,] 
  sims.list40pct[[6]][i,] <- ith.sim[6,] 
}

ncrit <- E.Ns*0.75 # change here to a desired number
lcrit <- log(ncrit)
nsteps <- 10
probs.drop40pct <- matrix(0, nrow=6,ncol=nsteps)
rownames(probs.drop40pct) <- row.names(S4mat)
for(i in 1:6){
  
  simsmat <- sims.list40pct[[i]] # drop the initial log pop size
  
  for(j in 2:(nsteps+1)){
    
    jthsubmat <- simsmat[,1:j]
    
    probs.drop40pct[i,(j-1)] <- sum(jthsubmat<lcrit[i])/(nrow(jthsubmat)*ncol(jthsubmat))
  }
}

mat <-  matrix(c(rep(c(1,1,1,1),3),rep(2,4)),nrow=4,ncol=4,byrow=T)
layout(mat)
par(oma=c(2,2,1,1), mar=c(4,4,4,0.5))
plot(1:nsteps,probs.drop40pct[1,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="Years ahead of present", ylab="Pr. Falling Under Conservation Threshold", cex.lab=1.25)
for(i in 2:6){
  points(1:nsteps,probs.drop40pct[i,], type="b",  lwd=2, col=cool.palette[i])
}
#par(oma=c(1,1,1,1), mar=c(4,4,4,1))
plot(0,0,type="n",axes=FALSE,xlab="",ylab="",main="")
legend(-0.8,1.65,legend=row.names(S4mat),lty=rep(1,6),lwd=rep(2,6),cex=1.15, col=cool.palette, bty="n",
       horiz=TRUE)

save.image("All-up2-40pctReduction.RData")

Finally, the same calculations but with a 50% reduction in the average growth rate we get the following 10 years forecast in the probability of falling under the conservation threshold of 75% of the mean of the historical stationary distribution:

pct.red <- 0.5 # mu reduced by 10 %
newmus <- muvec4kalman*pct.red

sims.list50pct <- list()
nyears.fwd <- 10
nsims <- 100000

for(i in 1:p){
  sims.list50pct[[i]] <- matrix(0,nrow=nsims,ncol=nyears.fwd+1)
}
names(sims.list50pct) <- row.names(S4mat)

for(i in 1:nsims){
  
  ith.sim <- MVX.sim(xo.vec = c(X1s[2,28],X2s[2,28],X3s[2,28],X4s[2,28],X5s[2,28],X6s[2,28]), 
                    theta.list=
                    list(p=p,
                      Ip =  diag(p),
                      muvec = newmus,
                      sigmasq = sigsq4kalman, 
                      B=B),
                    nsteps=10,
                    rnames = row.names(S4mat))
  
  sims.list50pct[[1]][i,] <- ith.sim[1,] 
  sims.list50pct[[2]][i,] <- ith.sim[2,] 
  sims.list50pct[[3]][i,] <- ith.sim[3,] 
  sims.list50pct[[4]][i,] <- ith.sim[4,] 
  sims.list50pct[[5]][i,] <- ith.sim[5,] 
  sims.list50pct[[6]][i,] <- ith.sim[6,] 
}

ncrit <- E.Ns*0.75 # change here to a desired number
lcrit <- log(ncrit)
nsteps <- 10
probs.drop50pct <- matrix(0, nrow=6,ncol=nsteps)
rownames(probs.drop50pct) <- row.names(S4mat)
for(i in 1:6){
  
  simsmat <- sims.list50pct[[i]] # drop the initial log pop size
  
  for(j in 2:(nsteps+1)){
    
    jthsubmat <- simsmat[,1:j]
    
    probs.drop50pct[i,(j-1)] <- sum(jthsubmat<lcrit[i])/(nrow(jthsubmat)*ncol(jthsubmat))
  }
}

mat <-  matrix(c(rep(c(1,1,1,1),3),rep(2,4)),nrow=4,ncol=4,byrow=T)
layout(mat)
par(oma=c(2,2,1,1), mar=c(4,4,4,0.5))
plot(1:nsteps,probs.drop50pct[1,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="Years ahead of present", ylab="Pr. Falling Under Conservation Threshold", cex.lab=1.25)
for(i in 2:6){
  points(1:nsteps,probs.drop50pct[i,], type="b",  lwd=2, col=cool.palette[i])
}
#par(oma=c(1,1,1,1), mar=c(4,4,4,1))
plot(0,0,type="n",axes=FALSE,xlab="",ylab="",main="")
legend(-0.8,1.65,legend=row.names(S4mat),lty=rep(1,6),lwd=rep(2,6),cex=1.15, col=cool.palette, bty="n",
       horiz=TRUE)

save.image("All-up2-50pctReduction.RData")

Now we re-assort the plots and put them into a single six-panels figure:

par(mfrow=c(2,3),oma=c(5,5,4,1), mar=c(2,2,2,1))
plot(1:nsteps,probs.drop10pct[1,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="", ylab="", cex.lab=1.25,
     main=row.names(S4mat)[1])
points(1:nsteps,probs.drop20pct[1,], type="b",  lwd=2, col=cool.palette[2])
points(1:nsteps,probs.drop30pct[1,], type="b",  lwd=2, col=cool.palette[3])
points(1:nsteps,probs.drop40pct[1,], type="b",  lwd=2, col=cool.palette[4])
points(1:nsteps,probs.drop50pct[1,], type="b",  lwd=2, col=cool.palette[5])
legend("topleft", legend=c("10% drop", "20% drop", "30% drop", "40% drop", "50% drop"), lty=rep(1,5),lwd=rep(2,5),
       col=cool.palette[1:5], bty="n",cex=0.8)

plot(1:nsteps,probs.drop10pct[2,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="", ylab="", cex.lab=1.25,
     main=row.names(S4mat)[2])
points(1:nsteps,probs.drop20pct[2,], type="b",  lwd=2, col=cool.palette[2])
points(1:nsteps,probs.drop30pct[2,], type="b",  lwd=2, col=cool.palette[3])
points(1:nsteps,probs.drop40pct[2,], type="b",  lwd=2, col=cool.palette[4])
points(1:nsteps,probs.drop50pct[2,], type="b",  lwd=2, col=cool.palette[5])
legend("bottomleft", legend=c("10% drop", "20% drop", "30% drop", "40% drop", "50% drop"), lty=rep(1,5),lwd=rep(2,5),
       col=cool.palette[1:5], bty="n",cex=0.8)



plot(1:nsteps,probs.drop10pct[3,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="", ylab="", cex.lab=1.25,
     main=row.names(S4mat)[3])
points(1:nsteps,probs.drop20pct[3,], type="b",  lwd=2, col=cool.palette[2])
points(1:nsteps,probs.drop30pct[3,], type="b",  lwd=2, col=cool.palette[3])
points(1:nsteps,probs.drop40pct[3,], type="b",  lwd=2, col=cool.palette[4])
points(1:nsteps,probs.drop50pct[3,], type="b",  lwd=2, col=cool.palette[5])
legend("bottomleft", legend=c("10% drop", "20% drop", "30% drop", "40% drop", "50% drop"), lty=rep(1,5),lwd=rep(2,5),
       col=cool.palette[1:5], bty="n",cex=0.8)


plot(1:nsteps,probs.drop10pct[4,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="", ylab="", cex.lab=1.25,
     main=row.names(S4mat)[4])
points(1:nsteps,probs.drop20pct[4,], type="b",  lwd=2, col=cool.palette[2])
points(1:nsteps,probs.drop30pct[4,], type="b",  lwd=2, col=cool.palette[3])
points(1:nsteps,probs.drop40pct[4,], type="b",  lwd=2, col=cool.palette[4])
points(1:nsteps,probs.drop50pct[4,], type="b",  lwd=2, col=cool.palette[5])
legend("bottomright", legend=c("10% drop", "20% drop", "30% drop", "40% drop", "50% drop"), lty=rep(1,5),lwd=rep(2,5),
       col=cool.palette[1:5], bty="n",cex=0.8)


plot(1:nsteps,probs.drop10pct[5,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="", ylab="", cex.lab=1.25,
     main=row.names(S4mat)[5])
points(1:nsteps,probs.drop20pct[5,], type="b",  lwd=2, col=cool.palette[2])
points(1:nsteps,probs.drop30pct[5,], type="b",  lwd=2, col=cool.palette[3])
points(1:nsteps,probs.drop40pct[5,], type="b",  lwd=2, col=cool.palette[4])
points(1:nsteps,probs.drop50pct[5,], type="b",  lwd=2, col=cool.palette[5])
legend("bottomleft", legend=c("10% drop", "20% drop", "30% drop", "40% drop", "50% drop"), lty=rep(1,5),lwd=rep(2,5),
       col=cool.palette[1:5], bty="n",cex=0.8)



plot(1:nsteps,probs.drop10pct[6,], type="b", col=cool.palette[1], lwd=2, ylim=c(0,1.1), bty="l",
     xlab="", ylab="", cex.lab=1.25,
     main=row.names(S4mat)[6])
points(1:nsteps,probs.drop20pct[6,], type="b",  lwd=2, col=cool.palette[2])
points(1:nsteps,probs.drop30pct[6,], type="b",  lwd=2, col=cool.palette[3])
points(1:nsteps,probs.drop40pct[6,], type="b",  lwd=2, col=cool.palette[4])
points(1:nsteps,probs.drop50pct[6,], type="b",  lwd=2, col=cool.palette[5])
legend("bottomright",  legend=c("10% drop", "20% drop", "30% drop", "40% drop", "50% drop"), lty=rep(1,5),lwd=rep(2,5),
       col=cool.palette[1:5], bty="n",cex=0.8)


mtext(text="Pr. of Falling Under Conservation Threshold", side=2, outer=TRUE, cex=1.25,line=1)
mtext(text="Years ahead of present", side=1, outer=TRUE, cex=1.15,line=1)

Probabilities of falling under conservation threshold with a 10-50% drop in the average growth rate

4.1 Proportion of time the population process is predicted to be below past population means:

Another way to look at the effects of the drop in the growth rate is to put to good use the concept of the stationary distribution. As mentioned above, the area under that curve within a given interval of population size gives the long term fraction of time the stochastic population process will spend within that interval. Thus, the cdf of the stationary distribution can be used to directly compute the probability that the population size will be found below a given threshold, after accounting for environmental variability and the subsequent year-to-year stochastic changes in the growth rate of the population. Therefore, in what follows we recompute the mean and variance of the stationary distribution for an array of percent drops in the stationary mean, just as above but then we compute directly the long run percent of time spent below the threshold, and proceed to plot these probabilities for all populations and for the entire array of percent drop in the growth rate.

pct.red <- c(0.9,0.8,0.7,0.6,0.5)
Ip <- diag(p)
muvec <- newmus
Sigma <- (1/sigsq4kalman)*Ip

Ipsq <- diag(p*p)
vec.Vinf <- solve(Ipsq - kronecker(B,B))%*%as.vector(Sigma)
Vinf <- matrix(vec.Vinf, nrow=p,ncol=p)
Vars.inf <- diag(Vinf)
names(Vars.inf) <- row.names(S4mat)

perc.redmat <- matrix(0, nrow=p,ncol=5)
rownames(perc.redmat) <- row.names(S4mat)
colnames(perc.redmat) <- pct.red
for(i in 1:5){
  perc.redmat[,i] <- plnorm(q=E.Ns*0.75, meanlog=muvec4kalman*pct.red[i], sdlog=sqrt(Vars.inf))
}

print(perc.redmat)


pct.red2 <- (1-pct.red)*100
par(oma=c(2,2,1,1), mar=c(4,4,4,1))
plot(25,0.8,type="n",xlab="% reduction in growth rate",main="Long term proportion of time spent <75% of past pop. average", ylim=c(0.65,1), xlim=c(10,55), bty="l", ylab="Proportion (stationary distribution cdf)")
for(i in 1:p){
  points(pct.red2,perc.redmat[i,], lty=1, lwd=2,type="b", col=cool.palette[i], xlab="", ylab="", main="")
}

legend("bottomright",legend=row.names(S4mat),lty=rep(1,6),lwd=rep(2,6),cex=1.15, col=cool.palette, bty="n")

$Long term proportion of time spent under 75% of past population averages for each deme$

And we can do the same but computing the proportion of time spent under 50% of past population averages

perc.redmat50 <- matrix(0, nrow=p,ncol=5)
rownames(perc.redmat50) <- row.names(S4mat)
colnames(perc.redmat50) <- pct.red
for(i in 1:5){
  perc.redmat50[,i] <- plnorm(q=E.Ns*0.5, meanlog=muvec4kalman*pct.red[i], sdlog=sqrt(Vars.inf))
}

print(perc.redmat50)


pct.red2 <- (1-pct.red)*100
par(oma=c(2,2,1,1), mar=c(4,4,4,1))
plot(25,0.6,type="n",xlab="% reduction in growth rate",main="Long term proportion of time spent <50% of past pop. average", ylim=c(0.5,1), xlim=c(10,55), bty="l", ylab="Proportion (stationary distribution cdf)")
for(i in 1:p){
  points(pct.red2,perc.redmat50[i,], lty=1, lwd=2,type="b", col=cool.palette[i], xlab="", ylab="", main="")
}

legend("bottomright",legend=row.names(S4mat),lty=rep(1,6),lwd=rep(2,6),cex=1.15, col=cool.palette, bty="n")

Long term proportion of time spent under 50% of past population averages for each deme

Taking the same approache we can ask what are the chances of the extreme event of the population size being under say one fourth of the past population mean for every deme:

perc.redmat25 <- matrix(0, nrow=p,ncol=5)
rownames(perc.redmat25) <- row.names(S4mat)
colnames(perc.redmat25) <- pct.red
for(i in 1:5){
  perc.redmat25[,i] <- plnorm(q=E.Ns*0.25, meanlog=muvec4kalman*pct.red[i], sdlog=sqrt(Vars.inf))
}

print(perc.redmat25)


par(oma=c(2,2,1,1), mar=c(4,4,4,1))
plot(25,0.20,type="n",xlab="% reduction in growth rate",main="Long term proportion of time spent <25% of past pop. average", ylim=c(0.25,1), xlim=c(10,55), bty="l", ylab="Proportion (stationary distribution cdf)")
for(i in 1:p){
  points(pct.red2,perc.redmat25[i,], lty=1, lwd=2,type="b", col=cool.palette[i], xlab="", ylab="", main="")
}

legend("bottomright",legend=row.names(S4mat),lty=rep(1,6),lwd=rep(2,6),cex=1.15, col=cool.palette, bty="n")

Long term proportion of time spent under 25% of past population averages for each deme

Alright, these analyses wrap up our basic PVA analysis.

Snail Kite Population Viability Analysis with State-Space models

Stochastic Gompertz State-Space model with Poisson observation error

José Miguel Ponciano, Orlando Acevedo-Charry, Caroline Poli, Meghan Beatty, Lara Elmquist, & Miguel A. Acevedo

10 July 2025