This script uses MethylAid to perform sample-level quality control on the data from three studies: (i) GOTO, (ii) CD4+ T-cell functional experiments, and (iii) TwinLife.


Setup

Load packages

library(MethylAid)
library(BiocParallel)

Load in the target files for each study

load("../Processing/GOTO_targets-unfiltered.Rdata")
load("../../Study2_CD4T/CD4T_data-targets.Rdata")
load("../../Study3_TwinLife/TwinLife_data-targets.Rdata")

Set BPPARAM

BPPARAM <- MulticoreParam(8)

GOTO

One IDAT file was corrupted (203527980080_R04C01), so we removed it from targets. R.W. later sent an uncorrupted version, which we used to update the file.

Summarize IDAT files for MethylAid

sData_goto <- summarize(targets_goto, 
                        batchSize=50, 
                        BPPARAM=BPPARAM,
                        force=TRUE)

Save sData

save(sData_goto, file="../Processing/MethylAid/GOTO_sData-wave1.Rdata")

CD4+ T-cell functional experiments

Summarize IDAT files for MethylAid

sData_cd4t <- summarize(targets_cd4t, 
                        batchSize=50, 
                        force=TRUE)

Save

save(sData_cd4t, file="../../Study2_CD4T/CD4T_data-sData.Rdata")

TwinLife Pilot

Summarize IDAT files for MethylAid

sData_twinlife <- summarize(targets_twinlife, 
                            batchSize=50, 
                            force=TRUE)

Save

save(sData_twinlife, file="../../Study3_TwinLife/TwinLife_data-sData.Rdata")

Following saving of data, the sData objects were exported locally and inspected using MethylAid.

Outliers were saved in the study directory, for use in the next script.

For GOTO, we identified 26 outliers and resent them in wave 2.


Session Info

sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Rocky Linux 8.10 (Green Obsidian)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.15.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] IlluminaHumanMethylationEPICmanifest_0.3.0
##  [2] minfi_1.40.0                              
##  [3] bumphunter_1.36.0                         
##  [4] locfit_1.5-9.8                            
##  [5] iterators_1.0.14                          
##  [6] foreach_1.5.2                             
##  [7] Biostrings_2.62.0                         
##  [8] XVector_0.34.0                            
##  [9] SummarizedExperiment_1.24.0               
## [10] Biobase_2.58.0                            
## [11] MatrixGenerics_1.10.0                     
## [12] matrixStats_1.0.0                         
## [13] GenomicRanges_1.46.1                      
## [14] GenomeInfoDb_1.34.9                       
## [15] IRanges_2.32.0                            
## [16] S4Vectors_0.36.2                          
## [17] BiocGenerics_0.44.0                       
## [18] BiocParallel_1.32.6                       
## [19] MethylAid_1.28.0                          
## [20] forcats_0.5.2                             
## [21] stringr_1.5.0                             
## [22] dplyr_1.1.3                               
## [23] purrr_0.3.4                               
## [24] readr_2.1.2                               
## [25] tidyr_1.2.1                               
## [26] tibble_3.2.1                              
## [27] ggplot2_3.4.3                             
## [28] tidyverse_1.3.2                           
## [29] rmarkdown_2.16                            
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.4.1              backports_1.4.1          
##   [3] BiocFileCache_2.2.1       plyr_1.8.8               
##   [5] splines_4.2.2             gridBase_0.4-7           
##   [7] digest_0.6.31             htmltools_0.5.5          
##   [9] fansi_1.0.4               magrittr_2.0.3           
##  [11] memoise_2.0.1             googlesheets4_1.0.1      
##  [13] tzdb_0.4.0                limma_3.54.2             
##  [15] annotate_1.72.0           modelr_0.1.9             
##  [17] askpass_1.1               timechange_0.2.0         
##  [19] siggenes_1.68.0           prettyunits_1.1.1        
##  [21] colorspace_2.1-0          blob_1.2.4               
##  [23] rvest_1.0.3               rappdirs_0.3.3           
##  [25] haven_2.5.1               xfun_0.39                
##  [27] hexbin_1.28.3             crayon_1.5.2             
##  [29] RCurl_1.98-1.12           jsonlite_1.8.5           
##  [31] genefilter_1.76.0         GEOquery_2.62.2          
##  [33] survival_3.5-5            glue_1.6.2               
##  [35] gtable_0.3.3              gargle_1.5.0             
##  [37] zlibbioc_1.44.0           DelayedArray_0.24.0      
##  [39] Rhdf5lib_1.20.0           HDF5Array_1.22.1         
##  [41] scales_1.2.1              DBI_1.1.3                
##  [43] rngtools_1.5.2            Rcpp_1.0.10              
##  [45] xtable_1.8-4              progress_1.2.2           
##  [47] bit_4.0.5                 mclust_6.0.0             
##  [49] preprocessCore_1.60.2     httr_1.4.6               
##  [51] RColorBrewer_1.1-3        ellipsis_0.3.2           
##  [53] pkgconfig_2.0.3           reshape_0.8.9            
##  [55] XML_3.99-0.14             sass_0.4.6               
##  [57] dbplyr_2.2.1              utf8_1.2.3               
##  [59] later_1.3.1               tidyselect_1.2.0         
##  [61] rlang_1.1.1               AnnotationDbi_1.56.2     
##  [63] munsell_0.5.0             cellranger_1.1.0         
##  [65] tools_4.2.2               cachem_1.0.8             
##  [67] cli_3.6.1                 generics_0.1.3           
##  [69] RSQLite_2.2.17            broom_1.0.1              
##  [71] evaluate_0.21             fastmap_1.1.1            
##  [73] yaml_2.3.7                knitr_1.43               
##  [75] bit64_4.0.5               fs_1.6.2                 
##  [77] beanplot_1.3.1            scrime_1.3.5             
##  [79] KEGGREST_1.34.0           nlme_3.1-162             
##  [81] doRNG_1.8.6               sparseMatrixStats_1.10.0 
##  [83] mime_0.12                 nor1mix_1.3-0            
##  [85] xml2_1.3.4                biomaRt_2.50.3           
##  [87] compiler_4.2.2            rstudioapi_0.14          
##  [89] filelock_1.0.2            curl_5.0.1               
##  [91] png_0.1-8                 reprex_2.0.2             
##  [93] bslib_0.5.0               stringi_1.7.12           
##  [95] GenomicFeatures_1.46.5    lattice_0.21-8           
##  [97] Matrix_1.5-4.1            multtest_2.50.0          
##  [99] vctrs_0.6.3               pillar_1.9.0             
## [101] lifecycle_1.0.3           rhdf5filters_1.10.1      
## [103] jquerylib_0.1.4           data.table_1.14.8        
## [105] bitops_1.0-7              httpuv_1.6.11            
## [107] rtracklayer_1.54.0        BiocIO_1.8.0             
## [109] R6_2.5.1                  promises_1.2.0.1         
## [111] codetools_0.2-19          MASS_7.3-60              
## [113] assertthat_0.2.1          rhdf5_2.42.1             
## [115] rjson_0.2.21              openssl_2.0.6            
## [117] withr_2.5.0               GenomicAlignments_1.30.0 
## [119] Rsamtools_2.10.0          GenomeInfoDbData_1.2.9   
## [121] hms_1.1.2                 quadprog_1.5-8           
## [123] grid_4.2.2                base64_2.0.1             
## [125] DelayedMatrixStats_1.16.0 illuminaio_0.40.0        
## [127] googledrive_2.0.0         shiny_1.7.2              
## [129] lubridate_1.9.2           restfulr_0.0.15

Cleanup

rm(list=ls())