Data, study cohort, and inclusion criteria
Using prospective data pooled from the multicentre (Brisbane, Newcastle, Geelong and Western Victoria, and Tasmania) Australian Longitudinal Cohort Study (the AusLong Study (https://www.menzies.utas.edu.au/research/diseasesandhealthissues/researchprojects/theauslongstudyoffactorsthatcontributetothedevelopmentandprogressionofms))^{54} of MS, we analysed 279 prospectively assessed first demyelination event (FDE) participants enrolled between 2003 and 2006^{55}. The AusLong Study (https://www.menzies.utas.edu.au/research/diseasesandhealthissues/researchprojects/theauslongstudyoffactorsthatcontributetothedevelopmentandprogressionofms)^{54} has ethical approval from the Tasmanian Health and Medical Research Ethics Committee (ref: H0010499, 01/5/2009); the Queensland Institute of Medical Research Human Research Ethics Committee (ref: P1252, 22/05/2009); the Royal Brisbane and Women’s Hospital Human Research Ethics Committee (ref: HREC/09/QRBW/299, 19/10/2009); the Hunter New England Human Research Ethics Committee (ref: 09/04/15/5.04, HREC/09/HNE/139, SSA/09/HNE/140, 10/08/2009); and the Barwon Health Human Research Ethics Committee (ref: BH 09/24, BH 03/46, 04/08/2009). All experiments (blood collection, genotyping, and clinical examinations) were conducted in accordance with the guidelines of each committee at the participating centres. Written informed consent was obtained from all subjects and/or their legal guardian(s) in accordance with the Declaration of Helsinki^{56}. EDSS scores were acquired prospectively at intervals up to 15 years post FDE by trained and certified neurologists, and a validated telephone EDSS was obtained at yearly computerassisted telephone interviews from 2 to 3 years post FDE. Initial data extraction (n = 279 cases) was done using the revised 2017 McDonalds criteria^{57} in which cases were defined at their last review as either remaining as clinically isolated syndrome (CIS), relapsingonset MS (ROMS), secondary progressive MS (SPMS), or progressiveonset MS (POMS). The selection criteria for the final cohort were done as illustrated in Fig. 1.
Genotyping, imputation, and quality control
The Illumina Infinium Global Screening Array24 v2.0 BeadChip was used to genotype DNA samples from AusLong Study participants. Genotypes were called using Illumina GenomeStudio software. Strict quality control was conducted according to established protocols^{58}. In brief, samples were excluded for three reasons: a call rate ≤ 99%, duplicate discordance, or gender error. Further, variants were excluded based on a call rate ≤ 99% or deviation from Hardy–Weinberg equilibrium with p < 1.0 × 10^{–6}. Two principal components analyses were conducted, one excluding HapMap samples to identify population outliers, and one including HapMap samples to help interpret the outliers^{58}. To maximise genetic coverage, the dataset were imputed using the algorithm implemented in IMPUTE version 4^{59} using 1000 Genomes phase 3^{60} as the reference genotype panel (GRCh37/hg19). Genetic variants having an imputation score ≤ 0.5 and minor allele frequency (MAF) ≤ 0.05 were discarded. For the remaining variants, those that were previously identified as being related to MS risk by IMSGC (https://doi.org/10.1126/science.aav7188?url_ver=Z39.882003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)^{39} were extracted (n = 208 of 232 SNPs) and considered in the association analysis. To be clearer, this study uses the IMSGC (https://doi.org/10.1126/science.aav7188?url_ver=Z39.882003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)^{39} risk SNP list as a reference source to identify MS risk variants that may also contribute to the risk of worsening of disability.
Imputation of missing EDSS measures
Imputation of missing EDSS (n = 471 of 3065) was based on a Bayesian approach using the JointAI Rpackage^{61}. Conditional on the observed SNPs genotypes, the EDSS scores were considered missing at random. The imputation model is a cumulative logit mixedeffect proportional odds model^{1} defined on 8 disability states (1 = EDSS [0–1.5], 2 = EDSS [2–2.5], 3 = EDSS [3–3.5], 4 = EDSS [4–4.5], 5 = EDSS [5–5.5], 6 = EDSS [6–6.5], 7 = EDSS [7–7.5], and 8 = EDSS [8–9.5]). Based on the results from previous studies^{23,25,62,63,64,65}, clinical and environmental factors including sex, age at disease onset,BMI, titre of Epstein–Barr Nuclear Antigen IgG (EBNA), smoking status, hospital anxiety depression scores (HADS), and previous EDSS scores (EDSSPREV), were used as “analysis variables” to impute EDSS levels, whereas latitude (study site), vitamin D supplementation status (VitD status), and MS disease course subtype (MSTYPE) were used as “auxiliary variables” to inform the imputation of any missing value(s) found in the “analysis variables”. These variables were included in the imputation model following their importance in predicting worsening of disability^{25}. In the cumulative logit mixed model, we posit that
$$\mathrm{logit}\left(P\left({y}_{ij}>k\right)\right)={\alpha }_{k}+{Z}_{ij}^{T}\beta +{\zeta }_{ij}^{T}{b}_{i}, k\in 1, \dots , K,$$
(1)
$${\gamma }_{1}, {\delta }_{1}, \dots , {\delta }_{K1}\sim N\left({\mu }_{\gamma }, {\sigma }_{\gamma }^{2}\right),$$
$${\mu }_{y}\sim N\left(0, {\sigma }_{u}\right),$$
$${\sigma }_{\gamma }^{2}\sim \Gamma (\varepsilon , \varepsilon )),$$
$${\gamma }_{k}\sim {\gamma }_{k1}+\mathrm{exp}\left({\delta }_{k1}\right), k=2,\dots , K,$$
where \({y}_{ij}\) is the EDSS level for subject \(i\) at visit \(j\), \({\gamma }_{k}\) are 7 intercepts representing the levels of EDSS (\(\mathrm{i}.\mathrm{e}., k=2,\dots ,8\)), \({Z}_{ij}^{T}\) is a fixedeffect design matrix containing the clinical and environmental covariates including timevarying effects (BMI, HADS, VitD status, and MSTYPE), with a corresponding vector of fixed effects regression coefficients \(\beta\); and \({\zeta }_{ij}^{T}\) is a design matrix containing random effects, \({b}_{i}\) are random deviations from the overall intercepts \({\gamma }_{k}\); \({\mu }_{y}\) and \({\sigma }_{\gamma }^{2}\) are hyperpriors^{61}.
Outcome measures, analysis endpoint, and data structure
Based on the study design, a firstorder Markov’s assumption for continuoustime EDSS transitions was considered^{1,66,67,68}, and defined as: “the current EDSS state (EDSS score) depends on the previous states (EDSSPREV), and all covariate histories”. In other words, using 8 categories (listed above) of the newly imputed EDSS score, we considered a continuoustime evolution for each disability state, wherein the state at the previous observation is retained until the current visit. Note that an observation may also represent a transition to a different state before arriving at the current state, or a repeated observation of the same underlying state at the end of followup.
Using these assumptions, we transformed the data and defined our clinical endpoint to capture continuoustime transitions in disability states as: y = 1 denoting “worsening” events (transitions from a lower to a higher disability state) made by an individual from study entry, and y = 0 denoting “improved” events (transitions from a higher to a lower disability state). All stablestate transitions or stable disease (no change in EDSS) were excluded as these were considered noninformative censoring events, and could lead to likelihood drainage, and potentially alter the results. Therefore, only informative (“improved”) events were censored. The event status for the \(i\)th subject at the \(j\)th visit was defined as
$${y}_{ij}=\left\{\begin{array}{l}1,\quad if\, worsening \,events, \\ \\ 0, \quad otherwise \end{array}\right..$$
Since individuals entered the study at different times, we defined the timetoworsening of disability as the time to switch disability states. Specifically, it is the continuous time elapsed since MS diagnosis until the current observation. This was achieved using the “msm2Surv” function in mstate Rpackage^{69}. To enable comparison of baseline hazards, the start time for all cases was set to zero at study entry.
Statistical analysis
All statistical analyses were conducted before (after) imputation of EDSS, respectively. To identify risk SNPs that predicted the timetoworsening, and/or associated with future worsening events over time, a threestage process was employed.
Stage 1: variable selection, risk estimation, and prognostic modeling
Variable selection
We randomly split the genotype data into 75% training (n = 202), and 25% test cohorts (n = 67) as depicted in Fig. 2. Utilising the training cohort, we first performed a global test to examine the added prognostic values of all SNPs (\(n\) = 208) that passed the QC stage using the Goeman’s “globaltest” Rpackage^{70}. Specifically, we tested the hypothesis (\({H}_{0}: {\beta }_{1}={\beta }_{2}=,\dots ,{\beta }_{208}=0\) versus \({H}_{a}: {\beta }_{1}\ne {\beta }_{2}\ne ,\dots ,{\beta }_{208}\ne 0\)) of no additional prognostic values of MS related genetic variants on the risk of worsening, conditional on the effects of clinical and environmental modifiers of disease (mentioned above). The significance level for this test was set to \(p\) < 0.05^{71}. Following the global test results, we applied three widely used penalised multivariable Cox models namely: least absolute shrinkage and a selection operator (LASSO), elastic net (ENET), and nonnegative garrotte combined with sure independent screening (NNGSIS), with tenfold crossvalidation (CV) to select important SNPs. Because a SNP can affect one or multiple EDSS transition steps with effects in different directions, we added interactions with EDSSPREV. Utilising the training cohort, LASSO and ENET were fitted using the Goeman’s penalised Rpackage^{72}, and NNGSIS using customised survival functions^{73}. SNPs having zero effect sizes were discarded.
Risk estimation and prognostic modelling
For the remaining SNPs selected across LASSO, ENET and NNGSIS, a timedependent multivariable Cox model with backward elimination (α = 0.05) was further employed to identify candidate SNPs^{8}. This was achieved using the “mfp” R function^{74}. Unbiased effect sizes for the candidate SNPs were then estimated in a random effects Cox model using the “coxme” R function^{75}. In the Cox model, we posit that
$${\lambda }_{i}\left(tx,b\right)={\lambda }_{0}\left(t\right)\mathrm{exp}\left({X}_{i}^{T}\beta +{\zeta }_{i}^{T}b\right),$$
(2)
$$b\sim N\left(0,\Sigma \left(\theta \right)\right),$$
where \(\lambda \left(.\right)\) is the hazard function for the \(i\)th subject at time \(t\), \({\lambda }_{0}\) is an unspecified baseline hazard function; \({X}_{i}^{T}\) is a fixedeffects design matrix containing SNPs dosages (including interactions with EDSSPREV), with a corresponding vector of fixedeffects regression coefficients \(\beta\); \({\zeta }_{i}^{T}\) is a design matrix of random effects, with a corresponding vector of random effects estimates \(b\); and \(\theta\) is the variance of the random effects. Note that the random components are subject identifiers nested within MSTYPE.
Stage 2: constructing genetic risk ensembles
To classify PwMS according to their disability status, we trained widely used RF and GBM models using the candidate genetic variants selected from stage 1 and compared their performance with MErf and MEgbm. The timedynamic area under the receiver operating characteristics curve (AUC) was used to assessed model performance. This was achieved using internal and external fivefold crossvalidation on the training cohort. That is, we split the entire training cohort into dynamic lagged training and internal validation sets (Fig. 2), such that predictors in the current visit were used to predict outcomes in the next visit^{51}. The prediction model is a mixedeffects logistic model with random intercepts (subject identifiers nested within MSTYPE), and random slopes (random time effects). Next, mixedeffects logistics decision trees were constructed, and genetic decision rules were extracted using the “inTrees” (interpretable trees) algorithm in the MEml Rpackage^{51}. To ensure nonredundant rule sets, we selected all rules of length between 2 and 5, with frequency < 0.01, and error < 0.25 in predicting future worsening events. In the mixedeffects logistic model used to classify MS subjects, we posit that
$${y}_{ij}\sim Bern\left({\mu }_{ij}\right), {y}_{ij}=\left\{\begin{array}{l}1, \quad if\, worsening\, status\\ 0, \quad otherwise \end{array}\right.,$$
$$\mathrm{logit}(\frac{{\mu }_{ij}}{1{\mu }_{ij}})={(\beta }_{0}+{b}_{0i})+{X}_{i}^{T}\delta +{(b}_{1i}+{\beta }_{1}{t}_{ij}),$$
(3)
$${b}_{0i},{b}_{1i}\sim N\left(0,\Sigma \right), j=1,\dots {m}_{i}, i=1,\dots , N,$$
$$\sum_{i=1}^{N}{m}_{i}=2786; N=269,$$
where \({y}_{ij}\) is the event status for the \(i\)th subject at the \(j\)th time point; \({b}_{0i}\) are random deviations from the overall intercept (bias term) \({\beta }_{0}\); \({b}_{1i}\) are random deviations from the overall slope \({\beta }_{1}\), while \(\Sigma\) is the variance–covariance matrix for the random effects. \({\beta }_{1}\) is the regression effect of the observation time (\({t}_{ij}\)) since diagnosis; \({X}_{i}^{T}\) is a fixedeffect design matrix containing SNP dosages (including interactions with EDSSPREV), with a corresponding vector of fixedeffects regression coefficients \(\delta\).
Stage 3: validation of the ensembles and their prediction rules
To evaluate the ensembles and the generated decision rules obtained in stage 2, we assessed externally the performance of the ensembles on the test cohort. Timedynamic ROC (receiver operating characteristic curves) analysis was used to assess how well each model predicted future worsening events. The importance of the SNPs and their genetic decision rules in predicting future worsening of disability was estimated and evaluated on the training and test cohorts, respectively. We prioritised SNPs based on average Gini impurity and relative influence; and by their deleteriousness in the human genome using combined annotation dependent depletion (CADD) scores^{76}.
Functional annotation and gene enrichment analysis
Utilising the candidate prognostic variants, functional annotation and gene enrichment analyses were further conducted using the FUMA software as per the online manual^{77}. The following parameters were used to further identify independent lead SNPs: maximum pvalue for lead SNPs < 0.05, maximum pvalue for annotation < 0.05, r^{2}threshold to define LD structure of lead SNPs ≥ 0.6, MAF ≥ 0.01, and maximum distance between LD blocks d < 250 kb. Because the raw pvalues were derived from a multivariable analysis, p < 0.05 was used as the threshold cutoff.
https://www.nature.com/articles/s4159802223685w
You may also like

Bias Suit Ruling Offers Disability Accommodation Guidelines

Removing barriers to participation for people with disability

New Technological Innovations That Are Improving Care for the Disabled

NC Health Agency Appealing Ruling on Services for Disabled

‘Disability rights are workers’ rights’: Disabled UC Davis workers on strike advocate for greater accessibility on the picket and in the university