The reproducibility crisis:
Enhancer-gene maps (Sakaue et al., 2024)

Akshita^{1, 2}, Ethan^{1, 2}, Noah^{1, 3}, and Rayo^{1, 2}

BMI 206: Statistical Methods

November 24, 2025

¹ University of California, San Francisco

² Biological and Medical Informatics Graduate Program

³ Bioengineering Graduate Program

What is SCENT?

SCENT Overview

Single-Cell ENhancer Target gene mapping

SCENT uses single-cell multimodal data (e.g., 10X Multiome RNA/ATAC) and links ATAC-seq peaks (putative enhancers) to their target genes by modeling association between chromatin accessibility and gene expression across individual single cells.

SCENT Usage

library(SCENT)

SCENT_obj <- CreateSCENTObj(rna = mrna, 
                            atac = atac, 
                            meta.data = meta,
                            peak.info = gene_peak,
                            covariates = c("log(nUMI)","percent.mito",
                                           "sample", "batch"), 
                            celltypes = "CD4")

library(SCENT)

SCENT_obj <- CreateSCENTObj(rna = mrna, 
                            atac = atac, 
                            meta.data = meta,
                            peak.info = gene_peak,
                            covariates = c("log(nUMI)","percent.mito",
                                           "sample", "batch"), 
                            celltypes = "CD4")

#	Argument name (format)	Descriptions
1	rna (sparse matrix)	A gene-by-cell count matrix from multimodal RNA-seq data. This is a raw count matrix without any normalization. The row names should be the gene names used in the `peak.info` file. The column names are the cell names which should be the same names used in the `cell` column of the dataframe specified for `meta.data`. Sparse matrix format is required.

library(SCENT)

SCENT_obj <- CreateSCENTObj(rna = mrna, 
                            atac = atac, 
                            meta.data = meta,
                            peak.info = gene_peak,
                            covariates = c("log(nUMI)","percent.mito",
                                           "sample", "batch"), 
                            celltypes = "CD4")

#	Argument name (format)	Descriptions
2	atac (sparse matrix)	A peak-by-cell count matrix from multimodal ATAC-seq data. This is a raw count matrix without any normalization. The row names should be the peak names used in the `peak.info` file. The column names are the cell names which should be the same names used in `rna` and the `cell`column of dataframe specified for `meta.data`. The matrix may not be binarized while it will be binarized within the function. Sparse matrix format is required.

library(SCENT)

SCENT_obj <- CreateSCENTObj(rna = mrna, 
                            atac = atac, 
                            meta.data = meta,
                            peak.info = gene_peak,
                            covariates = c("log(nUMI)","percent.mito",
                                           "sample", "batch"), 
                            celltypes = "CD4")

#	Argument name (format)	Descriptions
3	meta.data (dataframe)	A meta data frame for cells (rows are cells, and cell names should be in the column named as “cell”; see below example). Additionally, this text should include covariates to use in the model. Examples include: % mitochondrial reads, log(nUMI), sample, and batch as covariates. Dataframe format is required.

library(SCENT)

SCENT_obj <- CreateSCENTObj(rna = mrna, 
                            atac = atac, 
                            meta.data = meta,
                            peak.info = gene_peak,
                            covariates = c("log(nUMI)","percent.mito",
                                           "sample", "batch"), 
                            celltypes = "CD4")

#	Argument name (format)	Descriptions
5	covariates (a vector of character)	A vector of character fields that denote the covariates listed in the meta.data. For example, a set of covariates can be: %mitochondrial reads, log_nUMI, sample, and batch. Additionally the user can specify transformations to the covariates such as log transformation on nUMI counts for direct usage in the SCENT algorithm invoking poisson glm. We recommend users to at least use log(number_of_total_RNA_UMI_count_per_cell) as the base model is Poisson regression and we do not include the offset term into the default model.

library(SCENT)

SCENT_obj <- CreateSCENTObj(rna = mrna, 
                            atac = atac, 
                            meta.data = meta,
                            peak.info = gene_peak,
                            covariates = c("log(nUMI)","percent.mito",
                                           "sample", "batch"), 
                            celltypes = "CD4")

#	Argument name (format)	Descriptions
6	celltypes (character)	User specified naming of the celltype column in the meta.data file. This column should contain the names of the celltypes you want to test in this association analysis.

library(SCENT)

SCENT_obj <- CreateSCENTObj(rna = mrna, 
                            atac = atac, 
                            meta.data = meta,
                            peak.info = gene_peak,
                            covariates = c("log(nUMI)","percent.mito",
                                           "sample", "batch"), 
                            celltypes = "CD4")

#	Argument name (format)	Descriptions
4	peak.info (dataframe)	A table with two columns indicating which gene-peak pairs you want to test in this chunk (see below example) genes should be in the 1st column and peaks in the 2nd column. We highly recommend splitting gene-peak pairs into many chunks to increase computational efficiency (See Parallelized Jobs Info in Section 2). List(Dataframe) format which is a list of multiple data frames for parallelization is required. *

What to choose for peak.info?

README.md

“We usually only select peaks of which the center falls within 500 kb from the target gene (cis analysis).” (Github, 2024)

“Since many SCENT peaks are close to TSS regions, we considered if the enrichment was driven by TSS proximity.” (Sakaue et al., 2024)

mrna <- readRDS("data/NEAT-Seq/CD4/GSM5396333_CD4_RNA_counts.rds")
atac <- readRDS("data/NEAT-Seq/CD4/GSM5396336_CD4_Peak_matrix.rds")
meta <- read.csv("data/NEAT-Seq/CD4/GSM5396332_CD4cells.csv")

mrna |> rownames() |> head(1)

[1] "ENSG00000243485"

atac |> colnames() |> head(1)

[1] "chr1:817095-817595"

meta |> colnames()

 [1] "Sample"           "TSSEnrichment"    "ReadsInTSS"       "ReadsInPromoter" 
 [5] "ReadsInBlacklist" "PromoterRatio"    "PassQC"           "NucleosomeRatio" 
 [9] "nMultiFrags"      "nMonoFrags"       "nFrags"           "nDiFrags"        
[13] "BlacklistRatio"   "Clusters"         "ReadsInPeaks"     "FRIP"

library(biomaRt)
ensembl <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")

# These correspond to IFNG, IL2, BCL11B, IL4, and FOXP3 
gene_list <- (c("ENSG00000111537", "ENSG00000109471", "ENSG00000127152", "ENSG00000113520", "ENSG00000049768"))

gene_coords <- getBM(
  attributes = c("ensembl_gene_id", "chromosome_name", "start_position", "end_position"),
  filters = "ensembl_gene_id",
  values = gene_list,
  mart = ensembl
) |> mutate(chromosome_name = paste0("chr", chromosome_name))

atac_peak_coords <- tibble(peak = colnames(atac)) |>
  separate(peak, into = c("chr", "pos"), sep = ":") |>
  separate(pos, into = c("start", "end"), sep = "-", convert = TRUE)

distance <- 500000

gene_peak_candidates <- gene_coords |>
  rowwise() |>
  mutate(peaks = list({
    peaks_df <- atac_peak_coords |>
      filter(
        chr == chromosome_name &
        start >= start_position - distance &
        start <= start_position + distance 
      )
    if(nrow(peaks_df) == 0) tibble(chr = character(), start = integer(), end = integer()) else peaks_df
  }
  )) |>
  ungroup() |>
  unnest(peaks) |>
  mutate(peak = paste0(chr, ":", start, "-", end)) |>
  dplyr::select(ensembl_gene_id, peak)

gene_peak_candidates |> head()

# A tibble: 6 × 2
  ensembl_gene_id peak                  
  <chr>           <chr>                 
1 ENSG00000049768 chrX:48801154-48801654
2 ENSG00000049768 chrX:48801740-48802240
3 ENSG00000049768 chrX:48834850-48835350
4 ENSG00000049768 chrX:48835572-48836072
5 ENSG00000049768 chrX:48848794-48849294
6 ENSG00000049768 chrX:48891008-48891508

library(Matrix)
meta$celltype <- "CD4"  # or assign actual types if you have them

SCENT_obj <- CreateSCENTObj(rna = mrna, 
                            atac = atac |> t(), 
                            meta.data = meta,
                            peak.info = gene_peak_candidates,
                            covariates = c("TSSEnrichment", "PromoterRatio", "nFrags"),
                            celltypes = "CD4")

[1]. Github. (2024). SCENT: Single‑cell ENhancer target gene mapping using multimodal data with ATAC + RNA. https://github.com/immunogenomics/SCENT.

[2]. Sakaue, S., Weinand, K., Isaac, S., Dey, K. K., Jagadeesh, K., Kanai, M., Watts, G. F., Zhu, Z., Brenner, M. B., & others. (2024). Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles. Nature Genetics, 56(4), 615–626.