1 Introduction to Cell Type Enrichment Analysis with xCell 2.0

Cell type enrichment analysis and cellular deconvolution, are essential for understanding the heterogeneity of complex tissues in bulk transcriptomics data. xCell2 is an R package that developed upon the original xCell methodology (Aran, et al 2017), offering improved algorithms and enhanced performance. The key advancement in xCell 2.0 is its genericity - users can now utilize any reference, including single-cell RNA-Seq data, to train an xCell2 reference object for analysis.

This package is particularly useful for researchers working with bulk transcriptomics data who want to infer the cellular composition of their samples. By leveraging reference data from various sources, xCell 2.0 offers a flexible and powerful tool for understanding the cellular heterogeneity in complex tissues.

This vignette provides an overview of the package’s features and step-by-step guidance on: - Preparing input data - Generating custom xCell2 reference objects - Performing cell type enrichment analysis - Interpreting results and best practices

Whether you are new to cell type deconvolution or an experienced bioinformatician, this guide will help you leverage xCell2 effectively in your research.

1.1 Installation

1.1.1 From Bioconductor (Coming Soon)

To install xCell2 from Bioconductor, use:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
 install.packages("BiocManager")
}
BiocManager::install("xCell2")

1.1.2 From GitHub (Development Version)

To install the development version from GitHub, use:

if (!requireNamespace("devtools", quietly = TRUE)) {
 install.packages("devtools")
}
devtools::install_github("AlmogAngel/xCell2")

1.1.3 Dependencies

xCell2 relies on several Bioconductor packages. Most dependencies will be automatically installed. If you encounter any issues, you may need to manually install some packages, particularly ontoProc (version 1.26.4 or higher).

After installation, load the package with:

library(xCell2)

2 Creating a Custom Reference with `xCell2Train`

One of the key features of xCell2 is the ability to create custom reference objects tailored to your specific research needs. This section will guide you through the process of generating a custom xCell2 reference object using the xCell2Train function.

2.1 Why Create a Custom Reference?

Creating a custom reference allows you to: - Incorporate cell types specific to your research area - Use the latest single-cell RNA-seq data as a reference - Adapt the tool for non-standard organisms or tissues

2.2 Preparing the Input Data

Before using xCell2Train, you need to prepare two key inputs: 1. Reference Gene Expression Matrix: - Can be generated from various platforms: microarray, bulk RNA-Seq, or single-cell RNA-Seq - Genes should be in rows, samples/cells in columns - Should be normalized to both gene length and library size. Could be in either linear or logarithmic space.

Labels Data Frame: This data frame must contain information about each sample/cell in your reference. It should have four columns:

"ont": Cell type ontology (e.g., “CL:0000545” or NA if not applicable)
"label": Cell type name (e.g., “T-helper 1 cell”)
"sample": Sample/cell identifier matching column names in the reference matrix
"dataset": Source dataset or subject identifier

2.3 Example: Using DICE Dataset

Let’s walk through an example using a subset of the Database of Immune Cell Expression (DICE):

# Load the demo data
data(dice_demo_ref, package = "xCell2")

# Extract reference matrix
dice_ref <- as.matrix(dice_demo_ref@assays@data$logcounts)
colnames(dice_ref) <- make.unique(colnames(dice_ref))

# Prepare labels data frame
dice_labels <- as.data.frame(dice_demo_ref@colData)
dice_labels$ont <- NA
dice_labels$sample <- colnames(dice_ref)
dice_labels$dataset <- "DICE"

2.4 Assigning Cell Type Ontology (optional but Recommended)

You can skip the following step if: - You don’t want to use ontology to avoid cell type dependencies (not recommended) - You are sure that there are no cell type dependencies in your reference

To improve the quality of your custom reference, assign cell type ontologies using a controlled vocabulary:

Find the cell type ontology here: EMBL-EBI Ontology Lookup Service
For example, search “T cells, CD4+, memory” and look for the best match (“CD4-positive, alpha-beta memory T cell” - “CL:0000897”)

dice_labels[dice_labels$label == "B cells", ]$ont <- "CL:0000236"
dice_labels[dice_labels$label == "Monocytes", ]$ont <- "CL:0000576"
dice_labels[dice_labels$label == "NK cells", ]$ont <- "CL:0000623"
dice_labels[dice_labels$label == "T cells, CD8+", ]$ont <- "CL:0000625"
dice_labels[dice_labels$label == "T cells, CD4+", ]$ont <- "CL:0000624"
dice_labels[dice_labels$label == "T cells, CD4+, memory", ]$ont <- "CL:0000897"

Use the xCell2GetLineage function to check cell type dependencies:

xCell2::xCell2GetLineage(labels = dice_labels, outFile = "demo_dice_dep.tsv")

## loading from cache

## Warning in xCell2::xCell2GetLineage(labels = dice_labels, outFile =
## "demo_dice_dep.tsv"): It is recommended that you manually check the cell type
## lineage file: demo_dice_dep.tsv

Open demo_dice_dep.tsv and verify the lineage assignments. Note that “T cells, CD4+, memory” assigned as a descendant of “T cells, CD4+”

2.5 Generating the xCell2 Reference Object

With our inputs prepared, we can now create the xCell2 reference object. Simply run the following command:

set.seed(123) # (optional) For reproducibility
DICE_demo.xCell2Ref <- xCell2::xCell2Train(
 ref = dice_ref,
 labels = dice_labels,
 refType = "rnaseq"
)

## Finding dependencies using cell type ontology...

## loading from cache

## Generating signatures...

## Learning linear transformation and spillover parameters...

## Your custom xCell2 reference object is ready!

## > Please consider sharing with others here: https://dviraran.github.io/xCell2ref

Note that we set seed for reproducibility as generating pseudo-bulk samples from scRNA-Seq reference based on random sampling of cells.

Key Parameters of xCell2Train: - ref: Your prepared reference gene expression matrix - labels: The labels data frame you created - refType: Type of reference data (“rnaseq”, “array”, or “sc”) - useOntology: Whether to use ontological integration (default: TRUE) - numThreads: Number of threads for parallel processing

For a full list of parameters and their descriptions, refer to the xCell2Train function documentation.

2.6 Sharing Your Custom xCell2 Reference Object

Sharing your custom xCell2 reference object with the scientific community can greatly benefit researchers working in similar fields. Here’s how you can contribute:

Save Your Reference Object Save your newly generated xCell2 reference object:

save(DICE_demo.xCell2Ref, file = "DICE_demo.xCell2Ref.rda")

Prepare Your Reference for Sharing Ensure your reference includes:

A clear description of the source data
Any preprocessing steps applied
The version of xCell2 used to generate it

Upload to the xCell2 References Repository

Visit the xCell2 References Repository
Navigate to the “references” directory
Click “Add file” > “Upload files” and select your .rda file

Update the README To help others understand and use your reference:

Return to the main page of the xCell2 References Repository
Open the README.md file
Click the pencil icon (Edit this file) in the top right corner
Add an entry to the references table

Submit a Pull Request

Scroll to the bottom of the page
Select “Create a new branch for this commit and start a pull request”
Click “Propose changes”
On the next page, click “Create pull request”

The repository maintainers will review your submission and merge it if everything is in order. By sharing your custom reference, you contribute to the growth and improvement of cell type enrichment analysis across the scientific community. Your work could help researchers in related fields achieve more accurate and relevant results in their studies.

2.7 Next Steps

After creating your custom reference, you can use it for cell type enrichment analysis with xCell2Analysis. We’ll cover this in the next section. Remember, creating a robust reference is crucial for accurate results. Take time to ensure your input data is high-quality and properly annotated.

3 Using Pre-trained xCell2 References

xCell2 offers pre-trained reference objects that can be easily downloaded and used for your analysis. These references cover various tissue types and are based on well-curated datasets.

Dataset	Study	Species	Normalization	nSamples/Cells	nCellTypes	Platform	Tissues
BlueprintEncode	Martens JHA and Stunnenberg HG (2013), The ENCODE Project Consortium (2012), Aran D (2019)	Homo Sapiens	TPM	259	43	RNA-seq	Mixed
ImmGenData	The Immunological Genome Project Consortium (2008), Aran D (2019)	Mus Musculus	RMA	843	19	Microarray	Immune/Blood
Immune Compendium	Zaitsev A (2022)	Homo Sapiens	TPM	3626	40	RNA-seq	Immune/Blood
LM22	Chen B (2019)	Homo Sapiens	RMA	113	22	Microarray	Mixed
MouseRNAseqData	Benayoun B (2019)	Mus Musculus	TPM	358	18	RNA-seq	Mixed
Pan Cancer	Nofech-Mozes I (2023)	Homo Sapiens	Counts	25084	29	scRNA-seq	Tumor
Tabula Muris Blood	The Tabula Muris Consortium (2018)	Mus Musculus	Counts	11145	6	scRNA-seq	Bone Marrow, Spleen, Thymus
Tabula Sapiens Blood	The Tabula Sapiens Consortium (2022)	Homo Sapiens	Counts	11921	18	scRNA-seq	Blood, Lymph_Node, Spleen, Thymus, Bone Marrow
TME Compendium	Zaitsev A (2022)	Homo Sapiens	TPM	8146	25	RNA-seq	Tumor

You can also quick access popular pre-trained references that are available within the xCell2 package:

data(BlueprintEncode.xCell2Ref)

Or download a pre-trained reference directly within R using the download.file() function:

# Set the URL of the pre-trained reference
ref_url <- "https://dviraran.github.io/xCell2refs/references/BlueprintEncode.xCell2Ref.rds"
# Set the local filename to save the reference
local_filename <- "BlueprintEncode.xCell2Ref.rds"
# Download the file
download.file(ref_url, local_filename, mode = "wb")
# Load the downloaded reference
BlueprintEncode.xCell2Ref <- readRDS(local_filename)

Remember to choose a reference that’s appropriate for your specific tissue type and experimental context. The choice of reference can impact your results, so it’s important to select one that closely matches your biological system.

4 Performing Cell Type Enrichment Analysis with `xCell2Analysis`

After creating or obtaining an xCell2 reference object, the next step is to use it for cell type enrichment analysis on your bulk RNA-seq data. This section will guide you through using the xCell2Analysis function and interpreting its results.

4.1 Preparing Your Data

Before running the analysis, ensure you have: 1. An xCell2 reference object 2. A bulk gene expression matrix to analyze

For this example, we’ll use a pre-loaded demo reference and a sample bulk expression dataset:

# Load the demo reference object
data(DICE_demo.xCell2Ref, package = "xCell2")
# Load a sample bulk expression dataset
data(mix_demo, package = "xCell2")

4.2 Running `xCell2Analysis`

Now, let’s perform the cell type enrichment analysis:

xcell2_results <- xCell2::xCell2Analysis(
 mix = mix_demo,
 xcell2object = DICE_demo.xCell2Ref
)

Key Parameters: - mix: Your bulk mixture gene expression data (genes in rows, samples in columns) - xcell2object: An S4 object of class xCell2Object (your reference) - minSharedGenes: Minimum fraction of shared genes required (default: 0.9) - spillover: Whether to use spillover correction (default: TRUE) - numThreads: Number of threads for parallel processing (default: 1)

For a full list of parameters and their descriptions, refer to the xCell2Analysis function documentation.

4.3 Interpreting the Results

The xCell2Analysis function returns a matrix of cell type enrichment scores: - Rows represent cell types - Columns represent samples from your input mixture - Higher scores indicate a stronger presence of that cell type in the sample

Important considerations: - Scores are relative, not absolute proportions - Compare scores across samples to identify differences in cell type composition - Consider the biological context of your samples when interpreting results

4.4 Further Analysis

Once you have your xCell2 results, you can: - Correlate cell type enrichment scores with clinical or experimental variables - Perform differential enrichment analysis between sample groups - Use the scores as features for machine learning models

Remember, xCell2 provides estimates of relative cell type abundance. For absolute quantification, additional experimental validation may be necessary.

4.5 Troubleshooting

If you encounter issues: - Ensure your input data is properly formatted - Check that your mix and reference use the same gene annotation system - Try adjusting the minSharedGenes parameter if many genes are missing

For more detailed troubleshooting, refer to the package documentation or seek help on the xCell2 GitHub issues page.

5 Citing xCell2

If you use xCell2 in your research, please cite: Angel A, Naom L, Nabel-Levy S, Aran D. xCell 2.0: Robust Algorithm for Cell Type Proportion Estimation Predicts Response to Immune Checkpoint Blockade. bioRxiv 2024.

6 Referece

Aran, D., Hu, Z., & Butte, A. J. (2017). xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome biology, 18, 1-14.
Aran, D. (2021). Extracting insights from heterogeneous tissues. Nature Computational Science, 1(4), 247-248.

7 R Session Info

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Asia/Jerusalem
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] xCell2_0.99.0    BiocStyle_2.33.1
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.2.3                   GSEABase_1.67.0            
##   [3] rlang_1.1.4                 magrittr_2.0.3             
##   [5] matrixStats_1.4.0           compiler_4.4.1             
##   [7] RSQLite_2.3.7               reshape2_1.4.4             
##   [9] png_0.1-8                   vctrs_0.6.5                
##  [11] RcppZiggurat_0.1.6          quadprog_1.5-8             
##  [13] stringr_1.5.1               pkgconfig_2.0.3            
##  [15] crayon_1.5.3                fastmap_1.2.0              
##  [17] dbplyr_2.5.0                XVector_0.45.0             
##  [19] utf8_1.2.4                  promises_1.3.0             
##  [21] rmarkdown_2.28              tzdb_0.4.0                 
##  [23] pracma_2.4.4                graph_1.83.0               
##  [25] UCSC.utils_1.1.0            purrr_1.0.2                
##  [27] bit_4.0.5                   xfun_0.47                  
##  [29] Rfast_2.1.0                 zlibbioc_1.51.1            
##  [31] cachem_1.1.0                GenomeInfoDb_1.41.1        
##  [33] jsonlite_1.8.8              blob_1.2.4                 
##  [35] later_1.3.2                 DelayedArray_0.31.11       
##  [37] BiocParallel_1.39.0         parallel_4.4.1             
##  [39] singscore_1.25.0            R6_2.5.1                   
##  [41] stringi_1.8.4               bslib_0.8.0                
##  [43] limma_3.61.9                reticulate_1.39.0          
##  [45] GenomicRanges_1.57.1        jquerylib_0.1.4            
##  [47] Rcpp_1.0.13                 bookdown_0.40              
##  [49] SummarizedExperiment_1.35.1 knitr_1.48                 
##  [51] readr_2.1.5                 IRanges_2.39.2             
##  [53] httpuv_1.6.15               Matrix_1.7-0               
##  [55] igraph_2.0.3                tidyselect_1.2.1           
##  [57] rstudioapi_0.16.0           abind_1.4-5                
##  [59] yaml_2.3.10                 codetools_0.2-20           
##  [61] minpack.lm_1.2-4            curl_5.2.2                 
##  [63] plyr_1.8.9                  lattice_0.22-6             
##  [65] tibble_3.2.1                withr_3.0.1                
##  [67] Biobase_2.65.1              shiny_1.9.1                
##  [69] KEGGREST_1.45.1             evaluate_0.24.0            
##  [71] ontologyIndex_2.12          RcppParallel_5.1.9         
##  [73] BiocFileCache_2.13.0        Biostrings_2.73.1          
##  [75] pillar_1.9.0                BiocManager_1.30.25        
##  [77] filelock_1.0.3              MatrixGenerics_1.17.0      
##  [79] DT_0.33                     stats4_4.4.1               
##  [81] generics_0.1.3              vroom_1.6.5                
##  [83] ggplot2_3.5.1               BiocVersion_3.20.0         
##  [85] S4Vectors_0.43.2            hms_1.1.3                  
##  [87] munsell_0.5.1               scales_1.3.0               
##  [89] xtable_1.8-4                glue_1.7.0                 
##  [91] tools_4.4.1                 ontologyPlot_1.7           
##  [93] AnnotationHub_3.13.3        ontoProc_1.27.4            
##  [95] locfit_1.5-9.10             annotate_1.83.0            
##  [97] XML_3.99-0.17               grid_4.4.1                 
##  [99] tidyr_1.3.1                 edgeR_4.3.14               
## [101] colorspace_2.1-1            AnnotationDbi_1.67.0       
## [103] GenomeInfoDbData_1.2.12     cli_3.6.3                  
## [105] rappdirs_0.3.3              fansi_1.0.6                
## [107] S4Arrays_1.5.7              dplyr_1.1.4                
## [109] Rgraphviz_2.49.0            gtable_0.3.5               
## [111] sass_0.4.9                  digest_0.6.37              
## [113] BiocGenerics_0.51.1         SparseArray_1.5.31         
## [115] paintmap_1.0                htmlwidgets_1.6.4          
## [117] memoise_2.0.1               htmltools_0.5.8.1          
## [119] lifecycle_1.0.4             httr_1.4.7                 
## [121] statmod_1.5.0               mime_0.12                  
## [123] bit64_4.0.5

xCell 2.0: Cell Type Enrichment Analysis

2024-09-10

Contents

1 Introduction to Cell Type Enrichment Analysis with xCell 2.0

1.1 Installation

1.1.1 From Bioconductor (Coming Soon)

1.1.2 From GitHub (Development Version)

1.1.3 Dependencies

2 Creating a Custom Reference with `xCell2Train`

2.1 Why Create a Custom Reference?

2.2 Preparing the Input Data

2.3 Example: Using DICE Dataset

2.4 Assigning Cell Type Ontology (optional but Recommended)

2.5 Generating the xCell2 Reference Object

2.7 Next Steps

3 Using Pre-trained xCell2 References

4 Performing Cell Type Enrichment Analysis with `xCell2Analysis`

4.1 Preparing Your Data

4.2 Running `xCell2Analysis`

4.3 Interpreting the Results

4.4 Further Analysis

4.5 Troubleshooting

5 Citing xCell2

6 Referece

7 R Session Info

xCell 2.0: Cell Type Enrichment Analysis

2024-09-10

Contents

1 Introduction to Cell Type Enrichment Analysis with xCell 2.0

1.1 Installation

1.1.1 From Bioconductor (Coming Soon)

1.1.2 From GitHub (Development Version)

1.1.3 Dependencies

2 Creating a Custom Reference with xCell2Train

2.1 Why Create a Custom Reference?

2.2 Preparing the Input Data

2.3 Example: Using DICE Dataset

2.4 Assigning Cell Type Ontology (optional but Recommended)

2.5 Generating the xCell2 Reference Object

2.6 Sharing Your Custom xCell2 Reference Object

2.7 Next Steps

3 Using Pre-trained xCell2 References

4 Performing Cell Type Enrichment Analysis with xCell2Analysis

4.1 Preparing Your Data

4.2 Running xCell2Analysis

4.3 Interpreting the Results

4.4 Further Analysis

4.5 Troubleshooting

5 Citing xCell2

6 Referece

7 R Session Info

2 Creating a Custom Reference with `xCell2Train`

4 Performing Cell Type Enrichment Analysis with `xCell2Analysis`

4.2 Running `xCell2Analysis`