Contents

1 Introduction to Cell Type Enrichment Analysis with xCell 2.0

Cell type enrichment analysis and cellular deconvolution, are essential for understanding the heterogeneity of complex tissues in bulk transcriptomics data. xCell2 is an R package that developed upon the original xCell methodology (Aran, et al 2017), offering improved algorithms and enhanced performance. The key advancement in xCell 2.0 is its genericity - users can now utilize any reference, including single-cell RNA-Seq data, to train an xCell2 reference object for analysis.

This package is particularly useful for researchers working with bulk transcriptomics data who want to infer the cellular composition of their samples. By leveraging reference data from various sources, xCell 2.0 offers a flexible and powerful tool for understanding the cellular heterogeneity in complex tissues.

This vignette provides an overview of the package’s features and step-by-step guidance on: - Preparing input data - Generating custom xCell2 reference objects - Performing cell type enrichment analysis - Interpreting results and best practices

Whether you are new to cell type deconvolution or an experienced bioinformatician, this guide will help you leverage xCell2 effectively in your research.

1.1 Installation

1.1.1 From Bioconductor (Coming Soon)

To install xCell2 from Bioconductor, use:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
 install.packages("BiocManager")
}
BiocManager::install("xCell2")

1.1.2 From GitHub (Development Version)

To install the development version from GitHub, use:

if (!requireNamespace("devtools", quietly = TRUE)) {
 install.packages("devtools")
}
devtools::install_github("AlmogAngel/xCell2")

1.1.3 Dependencies

xCell2 relies on several Bioconductor packages. Most dependencies will be automatically installed. If you encounter any issues, you may need to manually install some packages, particularly ontoProc (version 1.26.4 or higher).

After installation, load the package with:

library(xCell2)

2 Creating a Custom Reference with xCell2Train

One of the key features of xCell2 is the ability to create custom reference objects tailored to your specific research needs. This section will guide you through the process of generating a custom xCell2 reference object using the xCell2Train function.

2.1 Why Create a Custom Reference?

Creating a custom reference allows you to: - Incorporate cell types specific to your research area - Use the latest single-cell RNA-seq data as a reference - Adapt the tool for non-standard organisms or tissues

2.2 Preparing the Input Data

Before using xCell2Train, you need to prepare two key inputs: 1. Reference Gene Expression Matrix: - Can be generated from various platforms: microarray, bulk RNA-Seq, or single-cell RNA-Seq - Genes should be in rows, samples/cells in columns - Should be normalized to both gene length and library size. Could be in either linear or logarithmic space.

  1. Labels Data Frame: This data frame must contain information about each sample/cell in your reference. It should have four columns:
  • "ont": Cell type ontology (e.g., “CL:0000545” or NA if not applicable)
  • "label": Cell type name (e.g., “T-helper 1 cell”)
  • "sample": Sample/cell identifier matching column names in the reference matrix
  • "dataset": Source dataset or subject identifier

2.3 Example: Using DICE Dataset

Let’s walk through an example using a subset of the Database of Immune Cell Expression (DICE):

# Load the demo data
data(dice_demo_ref, package = "xCell2")

# Extract reference matrix
dice_ref <- as.matrix(dice_demo_ref@assays@data$logcounts)
colnames(dice_ref) <- make.unique(colnames(dice_ref))

# Prepare labels data frame
dice_labels <- as.data.frame(dice_demo_ref@colData)
dice_labels$ont <- NA
dice_labels$sample <- colnames(dice_ref)
dice_labels$dataset <- "DICE"

2.5 Generating the xCell2 Reference Object

With our inputs prepared, we can now create the xCell2 reference object. Simply run the following command:

set.seed(123) # (optional) For reproducibility
DICE_demo.xCell2Ref <- xCell2::xCell2Train(
 ref = dice_ref,
 labels = dice_labels,
 refType = "rnaseq"
)
## Finding dependencies using cell type ontology...
## loading from cache
## Generating signatures...
## Learning linear transformation and spillover parameters...
## Your custom xCell2 reference object is ready!
## > Please consider sharing with others here: https://dviraran.github.io/xCell2ref

Note that we set seed for reproducibility as generating pseudo-bulk samples from scRNA-Seq reference based on random sampling of cells.

Key Parameters of xCell2Train: - ref: Your prepared reference gene expression matrix - labels: The labels data frame you created - refType: Type of reference data (“rnaseq”, “array”, or “sc”) - useOntology: Whether to use ontological integration (default: TRUE) - numThreads: Number of threads for parallel processing

For a full list of parameters and their descriptions, refer to the xCell2Train function documentation.

2.6 Sharing Your Custom xCell2 Reference Object

Sharing your custom xCell2 reference object with the scientific community can greatly benefit researchers working in similar fields. Here’s how you can contribute:

  1. Save Your Reference Object Save your newly generated xCell2 reference object:
save(DICE_demo.xCell2Ref, file = "DICE_demo.xCell2Ref.rda")
  1. Prepare Your Reference for Sharing Ensure your reference includes:
  • A clear description of the source data
  • Any preprocessing steps applied
  • The version of xCell2 used to generate it
  1. Upload to the xCell2 References Repository
  • Visit the xCell2 References Repository
  • Navigate to the “references” directory
  • Click “Add file” > “Upload files” and select your .rda file
  1. Update the README To help others understand and use your reference:
  • Return to the main page of the xCell2 References Repository
  • Open the README.md file
  • Click the pencil icon (Edit this file) in the top right corner
  • Add an entry to the references table
  1. Submit a Pull Request
  • Scroll to the bottom of the page
  • Select “Create a new branch for this commit and start a pull request”
  • Click “Propose changes”
  • On the next page, click “Create pull request”

The repository maintainers will review your submission and merge it if everything is in order. By sharing your custom reference, you contribute to the growth and improvement of cell type enrichment analysis across the scientific community. Your work could help researchers in related fields achieve more accurate and relevant results in their studies.

2.7 Next Steps

After creating your custom reference, you can use it for cell type enrichment analysis with xCell2Analysis. We’ll cover this in the next section. Remember, creating a robust reference is crucial for accurate results. Take time to ensure your input data is high-quality and properly annotated.

3 Using Pre-trained xCell2 References

xCell2 offers pre-trained reference objects that can be easily downloaded and used for your analysis. These references cover various tissue types and are based on well-curated datasets.

Dataset Study Species Normalization nSamples/Cells nCellTypes Platform Tissues
BlueprintEncode Martens JHA and Stunnenberg HG (2013), The ENCODE Project Consortium (2012), Aran D (2019) Homo Sapiens TPM 259 43 RNA-seq Mixed
ImmGenData The Immunological Genome Project Consortium (2008), Aran D (2019) Mus Musculus RMA 843 19 Microarray Immune/Blood
Immune Compendium Zaitsev A (2022) Homo Sapiens TPM 3626 40 RNA-seq Immune/Blood
LM22 Chen B (2019) Homo Sapiens RMA 113 22 Microarray Mixed
MouseRNAseqData Benayoun B (2019) Mus Musculus TPM 358 18 RNA-seq Mixed
Pan Cancer Nofech-Mozes I (2023) Homo Sapiens Counts 25084 29 scRNA-seq Tumor
Tabula Muris Blood The Tabula Muris Consortium (2018) Mus Musculus Counts 11145 6 scRNA-seq Bone Marrow, Spleen, Thymus
Tabula Sapiens Blood The Tabula Sapiens Consortium (2022) Homo Sapiens Counts 11921 18 scRNA-seq Blood, Lymph_Node, Spleen, Thymus, Bone Marrow
TME Compendium Zaitsev A (2022) Homo Sapiens TPM 8146 25 RNA-seq Tumor

You can also quick access popular pre-trained references that are available within the xCell2 package:

data(BlueprintEncode.xCell2Ref)

Or download a pre-trained reference directly within R using the download.file() function:

# Set the URL of the pre-trained reference
ref_url <- "https://dviraran.github.io/xCell2refs/references/BlueprintEncode.xCell2Ref.rds"
# Set the local filename to save the reference
local_filename <- "BlueprintEncode.xCell2Ref.rds"
# Download the file
download.file(ref_url, local_filename, mode = "wb")
# Load the downloaded reference
BlueprintEncode.xCell2Ref <- readRDS(local_filename)

Remember to choose a reference that’s appropriate for your specific tissue type and experimental context. The choice of reference can impact your results, so it’s important to select one that closely matches your biological system.

4 Performing Cell Type Enrichment Analysis with xCell2Analysis

After creating or obtaining an xCell2 reference object, the next step is to use it for cell type enrichment analysis on your bulk RNA-seq data. This section will guide you through using the xCell2Analysis function and interpreting its results.

4.1 Preparing Your Data

Before running the analysis, ensure you have: 1. An xCell2 reference object 2. A bulk gene expression matrix to analyze

For this example, we’ll use a pre-loaded demo reference and a sample bulk expression dataset:

# Load the demo reference object
data(DICE_demo.xCell2Ref, package = "xCell2")
# Load a sample bulk expression dataset
data(mix_demo, package = "xCell2")

4.2 Running xCell2Analysis

Now, let’s perform the cell type enrichment analysis:

xcell2_results <- xCell2::xCell2Analysis(
 mix = mix_demo,
 xcell2object = DICE_demo.xCell2Ref
)

Key Parameters: - mix: Your bulk mixture gene expression data (genes in rows, samples in columns) - xcell2object: An S4 object of class xCell2Object (your reference) - minSharedGenes: Minimum fraction of shared genes required (default: 0.9) - spillover: Whether to use spillover correction (default: TRUE) - numThreads: Number of threads for parallel processing (default: 1)

For a full list of parameters and their descriptions, refer to the xCell2Analysis function documentation.

4.3 Interpreting the Results

The xCell2Analysis function returns a matrix of cell type enrichment scores: - Rows represent cell types - Columns represent samples from your input mixture - Higher scores indicate a stronger presence of that cell type in the sample

Important considerations: - Scores are relative, not absolute proportions - Compare scores across samples to identify differences in cell type composition - Consider the biological context of your samples when interpreting results

4.4 Further Analysis

Once you have your xCell2 results, you can: - Correlate cell type enrichment scores with clinical or experimental variables - Perform differential enrichment analysis between sample groups - Use the scores as features for machine learning models

Remember, xCell2 provides estimates of relative cell type abundance. For absolute quantification, additional experimental validation may be necessary.

4.5 Troubleshooting

If you encounter issues: - Ensure your input data is properly formatted - Check that your mix and reference use the same gene annotation system - Try adjusting the minSharedGenes parameter if many genes are missing

For more detailed troubleshooting, refer to the package documentation or seek help on the xCell2 GitHub issues page.

5 Citing xCell2

If you use xCell2 in your research, please cite: Angel A, Naom L, Nabel-Levy S, Aran D. xCell 2.0: Robust Algorithm for Cell Type Proportion Estimation Predicts Response to Immune Checkpoint Blockade. bioRxiv 2024.

6 Referece

7 R Session Info

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Asia/Jerusalem
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] xCell2_0.99.0    BiocStyle_2.33.1
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.2.3                   GSEABase_1.67.0            
##   [3] rlang_1.1.4                 magrittr_2.0.3             
##   [5] matrixStats_1.4.0           compiler_4.4.1             
##   [7] RSQLite_2.3.7               reshape2_1.4.4             
##   [9] png_0.1-8                   vctrs_0.6.5                
##  [11] RcppZiggurat_0.1.6          quadprog_1.5-8             
##  [13] stringr_1.5.1               pkgconfig_2.0.3            
##  [15] crayon_1.5.3                fastmap_1.2.0              
##  [17] dbplyr_2.5.0                XVector_0.45.0             
##  [19] utf8_1.2.4                  promises_1.3.0             
##  [21] rmarkdown_2.28              tzdb_0.4.0                 
##  [23] pracma_2.4.4                graph_1.83.0               
##  [25] UCSC.utils_1.1.0            purrr_1.0.2                
##  [27] bit_4.0.5                   xfun_0.47                  
##  [29] Rfast_2.1.0                 zlibbioc_1.51.1            
##  [31] cachem_1.1.0                GenomeInfoDb_1.41.1        
##  [33] jsonlite_1.8.8              blob_1.2.4                 
##  [35] later_1.3.2                 DelayedArray_0.31.11       
##  [37] BiocParallel_1.39.0         parallel_4.4.1             
##  [39] singscore_1.25.0            R6_2.5.1                   
##  [41] stringi_1.8.4               bslib_0.8.0                
##  [43] limma_3.61.9                reticulate_1.39.0          
##  [45] GenomicRanges_1.57.1        jquerylib_0.1.4            
##  [47] Rcpp_1.0.13                 bookdown_0.40              
##  [49] SummarizedExperiment_1.35.1 knitr_1.48                 
##  [51] readr_2.1.5                 IRanges_2.39.2             
##  [53] httpuv_1.6.15               Matrix_1.7-0               
##  [55] igraph_2.0.3                tidyselect_1.2.1           
##  [57] rstudioapi_0.16.0           abind_1.4-5                
##  [59] yaml_2.3.10                 codetools_0.2-20           
##  [61] minpack.lm_1.2-4            curl_5.2.2                 
##  [63] plyr_1.8.9                  lattice_0.22-6             
##  [65] tibble_3.2.1                withr_3.0.1                
##  [67] Biobase_2.65.1              shiny_1.9.1                
##  [69] KEGGREST_1.45.1             evaluate_0.24.0            
##  [71] ontologyIndex_2.12          RcppParallel_5.1.9         
##  [73] BiocFileCache_2.13.0        Biostrings_2.73.1          
##  [75] pillar_1.9.0                BiocManager_1.30.25        
##  [77] filelock_1.0.3              MatrixGenerics_1.17.0      
##  [79] DT_0.33                     stats4_4.4.1               
##  [81] generics_0.1.3              vroom_1.6.5                
##  [83] ggplot2_3.5.1               BiocVersion_3.20.0         
##  [85] S4Vectors_0.43.2            hms_1.1.3                  
##  [87] munsell_0.5.1               scales_1.3.0               
##  [89] xtable_1.8-4                glue_1.7.0                 
##  [91] tools_4.4.1                 ontologyPlot_1.7           
##  [93] AnnotationHub_3.13.3        ontoProc_1.27.4            
##  [95] locfit_1.5-9.10             annotate_1.83.0            
##  [97] XML_3.99-0.17               grid_4.4.1                 
##  [99] tidyr_1.3.1                 edgeR_4.3.14               
## [101] colorspace_2.1-1            AnnotationDbi_1.67.0       
## [103] GenomeInfoDbData_1.2.12     cli_3.6.3                  
## [105] rappdirs_0.3.3              fansi_1.0.6                
## [107] S4Arrays_1.5.7              dplyr_1.1.4                
## [109] Rgraphviz_2.49.0            gtable_0.3.5               
## [111] sass_0.4.9                  digest_0.6.37              
## [113] BiocGenerics_0.51.1         SparseArray_1.5.31         
## [115] paintmap_1.0                htmlwidgets_1.6.4          
## [117] memoise_2.0.1               htmltools_0.5.8.1          
## [119] lifecycle_1.0.4             httr_1.4.7                 
## [121] statmod_1.5.0               mime_0.12                  
## [123] bit64_4.0.5