Last updated: 2024-02-06

Checks: 6 1

Knit directory: multigroup_ctwas_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20231112)

The command set.seed(20231112) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: b0324db

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version b0324db. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    results/

Unstaged changes:
    Deleted:    analysis/CopyOfmulti_tissue_ldmerge_validation.Rmd
    Modified:   analysis/multi_tissue_ldmerge_validation.Rmd
    Modified:   analysis/multi_tissue_ldmerge_validation_xgboost_fit.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/multi_tissue_ldmerge_validation.Rmd) and HTML (docs/multi_tissue_ldmerge_validation.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	b63fc71	XSun	2024-01-30	update
html	b63fc71	XSun	2024-01-30	update
Rmd	9650d4c	sq-96	2024-01-27	update
html	9650d4c	sq-96	2024-01-27	update
Rmd	c937f2b	XSun	2024-01-25	update
html	c937f2b	XSun	2024-01-25	update
Rmd	251cdd1	XSun	2024-01-15	update
html	251cdd1	XSun	2024-01-15	update
Rmd	3a372ba	XSun	2024-01-15	update
html	3a372ba	XSun	2024-01-15	update
Rmd	8651160	XSun	2024-01-15	update
html	8651160	XSun	2024-01-15	update
Rmd	19da0ec	XSun	2024-01-12	update
html	19da0ec	XSun	2024-01-12	update
Rmd	fbfef8f	XSun	2024-01-11	update
html	fbfef8f	XSun	2024-01-11	update

Overview

We validate genes with susie pip > 0.8 here.

The basic idea is:

Some biological pathways are related to the traits. Genes within these pathways are more likely to be associated with these traits. Our approach involves aggregating these genes into a collective group. This allows us to assess whether the genes identified by cTWAS are overrepresented in this group.

However, the presence of common genes across multiple pathways presents a challenge to this straightforward aggregation approach. To address this, we propose weighting the pathways, assigning a unique score to each gene. By selecting genes that meet a specific score threshold, we can form a more refined group. We can then evaluate the enrichment of cTWAS-identified genes within this selectively grouped set.

Model

The model is \(y=X*w\)

y is an n-dimensional vector representing gene-trait associations (n = number of genes), which can be:

z-scores computed by MAGMA
a binary vector indicating gene-trait relationships (genes with FDR < 0.05 as per MAGMA are marked 1).

X is an n×m matrix (m = number of pathways) indicating gene membership in specific pathways.

We fitted this model using different models.

If y is a z-score vector, it can be fitted using

linear Susie
XGBoost: regression with squared loss

If y is a binarized vector, the model can be fitted using

logistic Susie
XGBoost: logistic regression for binary classification, output probability

Benchmarks

The model fitting results in pathway weights, from which we predict gene labels \(\hat{y}\). We then categorize genes based on these new labels.

For z-score model, we compute the p-values from the new labels(z-scores), then compute FDR. Then we tested different cutoffs for gene selection. The cutoffs are: 0.05,0.1,0.2
For binarized model. Genes with labels > 0.5/0.6/0.7/0.8 are considered benchmarks.

Testing genes

Genes from ctwas results are divided into different groups based on their SuSiE PIPs:

high (>0.8)
moderate (0.8 > PIP > 0.5)
low (<0.5)

Fisher exact test

We assess whether high-PIP genes are more enriched in our benchmarks than other groups using Fisher exact tests.

The testing matrix is:

fisher_matrix <- matrix(c("n1","n2","n3","n4"),nrow = 2,ncol = 2)
rownames(fisher_matrix) <- c("#included","#notincluded")
colnames(fisher_matrix) <- c("pip08","other group")

print(fisher_matrix)

             pip08 other group
#included    "n1"  "n3"       
#notincluded "n2"  "n4"

Pathways

The pathways are from Go Biological Process (gobp), Go Molecular Function (gomf), Go Cellular Component (gocc) and KEGG.

Functions

# Function to compute and display benchmark genes and overlaps
compute_gene_overlap_z <- function(threshold = 0.05) {
  traits <- c("IBD-ebi-a-GCST004131", "LDL-ukb-d-30780_irnt", "SBP-ukb-a-360", "SCZ-ieu-b-5102", "WBC-ieu-b-30", "aFib-ebi-a-GCST006414")
  folder_xgboost <- "/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/xgboost/"
  folder_xgboost_jointly <- "/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/xgboost_jointly/"
  folder_susie <- "/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/susie/"

  sum <- matrix(nrow = length(traits), ncol = 7)
  for (i in seq_along(traits)) {
    xg <- get(load(paste0(folder_xgboost, "gene_labels_xgboost_", traits[i], ".rdata")))
    xg_pass <- xg$SYMBOL[xg$pred_fdr_xgb < threshold]
    xg_pass <- xg_pass[!duplicated(xg_pass)]
    
    xg_pass_gobp <- xg$SYMBOL[xg$pred_fdr_xgb < threshold & xg$db == "gobp"]
    xg_pass_gobp <- xg_pass_gobp[!duplicated(xg_pass_gobp)]
    
    xg_joint <- get(load(paste0(folder_xgboost_jointly, "gene_labels_xgboost_", traits[i], ".rdata")))
    xg_pass_joint <- xg_joint$SYMBOL[xg_joint$pred_fdr_xgb < threshold]
    xg_pass_joint <- xg_pass_joint[!duplicated(xg_pass_joint)]
    
    xg_pass_gobp <- xg$SYMBOL[xg$pred_fdr_xgb < threshold & xg$db == "gobp"]
    xg_pass_gobp <- xg_pass_gobp[!duplicated(xg_pass_gobp)]
    
    susie <- get(load(paste0(folder_susie, "gene_labels_susie_", traits[i], ".rdata")))
    susie_pass <- susie$SYMBOL[susie$fdr_pred_linsusie < threshold]
    susie_pass <- susie_pass[!duplicated(susie_pass)]

    overlap_xgboost_susie <- sum(xg_pass %in% susie_pass)
    overlap_xgboost_gobp_susie <- sum(xg_pass_gobp %in% susie_pass)
    overlap_xgboost_sep_joint <- sum(xg_pass %in% xg_pass_joint)
    sum[i, ] <- c(length(susie_pass),length(xg_pass),length(xg_pass_joint),length(xg_pass_gobp),overlap_xgboost_susie,overlap_xgboost_gobp_susie,overlap_xgboost_sep_joint)
  }

  rownames(sum) <- traits
  colnames(sum) <- c("#ofbenchmarkgene_susie", "#ofbenchmarkgene_xgboost_sep","#ofbenchmarkgene_xgboost_joint","#ofbenchmarkgene_xgboost_gobp", "#ofoverlap of xgboos_all and susie_all", "#ofoverlap of xgboos_gobp and susie_all","#ofoverlap of xgboos_sep and xgboos_joint")

  DT::datatable(sum, caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;', '#of benchmark genes and the overlaps'), options = list(pageLength = 10))
}

compute_gene_overlap_b <- function(threshold = 0.5) {
  traits <- c("IBD-ebi-a-GCST004131", "LDL-ukb-d-30780_irnt", "SBP-ukb-a-360", "SCZ-ieu-b-5102", "WBC-ieu-b-30", "aFib-ebi-a-GCST006414")
  folder_xgboost <- "/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/xgboost/"
  folder_xgboost_jointly <- "/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/xgboost_jointly/"
  folder_susie <- "/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/susie/"

  sum <- matrix(nrow = length(traits), ncol = 7)
  for (i in seq_along(traits)) {
    xg <- get(load(paste0(folder_xgboost, "gene_labels_xgboost_", traits[i], ".rdata")))
    xg_pass <- xg$SYMBOL[xg$pred_y_logi_xgb > threshold]
    xg_pass <- xg_pass[!duplicated(xg_pass)]

    xg_pass_gobp <- xg$SYMBOL[xg$pred_y_logi_xgb > threshold & xg$db =="gobp"]
    xg_pass_gobp <- xg_pass_gobp[!duplicated(xg_pass_gobp)]
    
    xg_joint <- get(load(paste0(folder_xgboost_jointly, "gene_labels_xgboost_", traits[i], ".rdata")))
    xg_pass_joint <- xg_joint$SYMBOL[xg_joint$pred_y_logi_xgb > threshold]
    xg_pass_joint <- xg_pass_joint[!duplicated(xg_pass_joint)]
    
    susie <- get(load(paste0(folder_susie, "gene_labels_susie_", traits[i], ".rdata")))
    susie_pass <- susie$SYMBOL[susie$y_pred_logi > threshold]
    susie_pass <- susie_pass[!duplicated(susie_pass)]

    overlap_xgboost_susie <- sum(xg_pass %in% susie_pass)
    overlap_xgboost_gobp_susie <- sum(xg_pass_gobp %in% susie_pass)
    overlap_xgboost_sep_joint <- sum(xg_pass %in% xg_pass_joint)
    sum[i, ] <- c(length(susie_pass),length(xg_pass),length(xg_pass_joint),length(xg_pass_gobp),overlap_xgboost_susie,overlap_xgboost_gobp_susie,overlap_xgboost_sep_joint)
  }

  rownames(sum) <- traits
  colnames(sum) <- c("#ofbenchmarkgene_susie", "#ofbenchmarkgene_xgboost_sep","#ofbenchmarkgene_xgboost_joint","#ofbenchmarkgene_xgboost_gobp", "#ofoverlap of xgboos_all and susie_all", "#ofoverlap of xgboos_gobp and susie_all","#ofoverlap of xgboos_sep and xgboos_joint")

  DT::datatable(sum, caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;', '#of benchmark genes and the overlaps'), options = list(pageLength = 10))
}

Fitting results

Susie

link to the results

xgboost

link to the results

Enrichment results (Fisher exact test)

summary table

Susie - modelling z-scores - benchmarks from 4 data base, run susie seperately

FDR cutoff for selecting benchmark genes = 0.05

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/fisher_zscore_allcutoff.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/num_benchmark_genes_zscore_allcutoff.rdata")
# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.05)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.05)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.05) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

FDR cutoff for selecting benchmark genes = 0.1

# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.1)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.1)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.1) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

FDR cutoff for selecting benchmark genes = 0.2

# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.2)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.2)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.2) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Susie - modelling binary y - benchmarks from 4 data base, run susie seperately

Probability cutoff for selecting benchmark genes = 0.8

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/fisher_biny_allcutoff.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/num_benchmark_genes_biny_allcutoff.rdata")
# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.8)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.8)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.7

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/fisher_biny_allcutoff.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/num_benchmark_genes_biny_allcutoff.rdata")
# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.7)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.7)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.6

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/fisher_biny_allcutoff.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/num_benchmark_genes_biny_allcutoff.rdata")
# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.6)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.6)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.5

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/fisher_biny_allcutoff.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_susie/num_benchmark_genes_biny_allcutoff.rdata")
# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.5)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.5)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.5","pip>0.8/pip<0.5","pip>0.8/pip0.8~0.5")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

xgboost - modelling z-scores - benchmarks from 4 data base, run xgboost seperately

FDR cutoff for selecting benchmark genes = 0.05

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost/fisher_zscore_allcutoff.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost/num_benchmark_genes_zscore_allcutoff.rdata")
# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.05)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.05)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.05) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

FDR cutoff for selecting benchmark genes = 0.1

# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.1)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.1)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.1) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

FDR cutoff for selecting benchmark genes = 0.2

# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.2)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.2)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.2) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

xgboost - modelling binary y - benchmarks from all data base, run xgboost seperately

Probability cutoff for selecting benchmark genes = 0.8

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost/fisher_biny_allcutoff.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost/num_benchmark_genes_biny_allcutoff.rdata")
# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.8)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.8)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.7

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.7)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.7)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.6

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.6)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.6)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.5

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.5)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.5)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

xgboost - modelling z-scores - benchmarks from 4 data base, run xgboost jointly

FDR cutoff for selecting benchmark genes = 0.05

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost_jointly/fisher_zscore_allcutoff.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost_jointly/num_benchmark_genes_zscore_allcutoff.rdata")
# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.05)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.05)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.05) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

FDR cutoff for selecting benchmark genes = 0.1

# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.1)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.1)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.1) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

FDR cutoff for selecting benchmark genes = 0.2

# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.2)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.2)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.2) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

xgboost - modelling binary y - benchmarks from all data base, run xgboost jointly

Probability cutoff for selecting benchmark genes = 0.8

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost_jointly/fisher_biny_allcutoff.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost_jointly/num_benchmark_genes_biny_allcutoff.rdata")
# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.8)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.8)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.7

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.7)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.7)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.6

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.6)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.6)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.5

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.5)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.5)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

xgboost - modelling z-scores - benchmarks from GOBP

FDR cutoff for selecting benchmark genes = 0.05

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost/fisher_zscore_allcutoff_gobp.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost/num_benchmark_genes_zscore_allcutoff_gobp.rdata")
# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.05)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.05)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.05) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

FDR cutoff for selecting benchmark genes = 0.1

# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.1)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.1)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.1) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

FDR cutoff for selecting benchmark genes = 0.2

# Convert named vectors to data frames
fisher_z_df <- as.data.frame(fisher_zscore_p$fdrcutoff_0.2)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_zscore$fdrcutoff_0.2)

# Set row names as a new column
fisher_z_df$id <- row.names(fisher_z_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_z_df,by = "id")
#colnames(merged_df_0.2) <- c("traits","#of benchmark genes","fisher_z_p_pip08+/pip08-","fisher_z_p_pip08+/pip05-","fisher_z_p_pip08+/pip05~08")
colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

xgboost - modelling binary y - benchmarks from GOBP

Probability cutoff for selecting benchmark genes = 0.8

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost/fisher_biny_allcutoff_gobp.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost/num_benchmark_genes_biny_allcutoff_gobp.rdata")
# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.8)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.8)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.7

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.7)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.7)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.6

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.6)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.6)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.5

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.5)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.5)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

xgboost - modelling binary y - train model on top 500 genes with lowest magma fdr

Probability cutoff for selecting benchmark genes = 0.8

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost_trunc/fisher_biny_allcutoff_numtrunc500.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost_trunc/num_benchmark_genes_biny_allcutoff500.rdata")
# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.8)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.8)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.7

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.7)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.7)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.6

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.6)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.6)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.5

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.5)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.5)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

xgboost - modelling binary y - train model on top 1000 genes with lowest magma fdr

Probability cutoff for selecting benchmark genes = 0.8

load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost_trunc/fisher_biny_allcutoff_numtrunc1000.rdata")
load("/project/xinhe/xsun/ctwas/4.multi_tissue_process/results/fisher_xgboost_trunc/num_benchmark_genes_biny_allcutoff1000.rdata")
# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.8)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.8)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.7

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.7)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.7)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.6

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.6)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.6)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Probability cutoff for selecting benchmark genes = 0.5

# Convert named vectors to data frames
fisher_b_df <- as.data.frame(fisher_biny_p$probcutoff_0.5)
num_benchmark_genes_df <- as.data.frame(num_benchmark_genes_biny$probcutoff_0.5)

# Set row names as a new column
fisher_b_df$id <- row.names(fisher_b_df)
num_benchmark_genes_df$id <- row.names(num_benchmark_genes_df)

# Merge data frames by the new column
merged_df <- merge(num_benchmark_genes_df,fisher_b_df,by = "id")

colnames(merged_df) <- c("traits","#of benchmark genes","pip>0.8/pip<0.8","pip>0.8/pip<0.5","pip>0.8/pip0.5~0.8")

DT::datatable(merged_df,
             caption = htmltools::tags$caption(style = 'caption-side: left; text-align: left; color:black; font-size:150%;',
                                                  'Fisher exact test p values for different groups'),
             options = list(pageLength = 6))

Benchmark gene comparison

Modelling z-scores

FDR cutoff for selecting benchmark genes = 0.05

compute_gene_overlap_z(0.05)

FDR cutoff for selecting benchmark genes = 0.1

compute_gene_overlap_z(0.1)

FDR cutoff for selecting benchmark genes = 0.2

compute_gene_overlap_z(0.2)

Modelling binarized y

Probability cutoff for selecting benchmark genes = 0.8

compute_gene_overlap_b(0.8)

Probability cutoff for selecting benchmark genes = 0.7

compute_gene_overlap_b(0.7)

Probability cutoff for selecting benchmark genes = 0.6

compute_gene_overlap_b(0.6)

Probability cutoff for selecting benchmark genes = 0.5

compute_gene_overlap_b(0.5)

sessionInfo()

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3      pillar_1.9.0      compiler_4.2.0    bslib_0.3.1      
 [5] later_1.3.0       jquerylib_0.1.4   git2r_0.30.1      workflowr_1.7.0  
 [9] tools_4.2.0       digest_0.6.29     jsonlite_1.8.0    evaluate_0.15    
[13] lifecycle_1.0.4   tibble_3.2.1      pkgconfig_2.0.3   rlang_1.1.2      
[17] cli_3.6.1         rstudioapi_0.13   crosstalk_1.2.0   yaml_2.3.5       
[21] xfun_0.41         fastmap_1.1.0     stringr_1.5.1     knitr_1.39       
[25] fs_1.5.2          vctrs_0.6.5       sass_0.4.1        htmlwidgets_1.5.4
[29] rprojroot_2.0.3   DT_0.22           glue_1.6.2        R6_2.5.1         
[33] fansi_1.0.3       rmarkdown_2.25    magrittr_2.0.3    whisker_0.4      
[37] promises_1.2.0.1  htmltools_0.5.2   httpuv_1.6.5      utf8_1.2.2       
[41] stringi_1.7.6

Validation for multi-tissue results

XSun

2024-01-11

Overview

Model

Benchmarks

Testing genes

Fisher exact test

Pathways

Functions

Fitting results

Susie

xgboost

Enrichment results (Fisher exact test)

Susie - modelling z-scores - benchmarks from 4 data base, run susie seperately

FDR cutoff for selecting benchmark genes = 0.05

FDR cutoff for selecting benchmark genes = 0.1

FDR cutoff for selecting benchmark genes = 0.2

Susie - modelling binary y - benchmarks from 4 data base, run susie seperately

Probability cutoff for selecting benchmark genes = 0.8

Probability cutoff for selecting benchmark genes = 0.7

Probability cutoff for selecting benchmark genes = 0.6

Probability cutoff for selecting benchmark genes = 0.5

xgboost - modelling z-scores - benchmarks from 4 data base, run xgboost seperately

FDR cutoff for selecting benchmark genes = 0.05

FDR cutoff for selecting benchmark genes = 0.1

FDR cutoff for selecting benchmark genes = 0.2

xgboost - modelling binary y - benchmarks from all data base, run xgboost seperately

Probability cutoff for selecting benchmark genes = 0.8

Probability cutoff for selecting benchmark genes = 0.7

Probability cutoff for selecting benchmark genes = 0.6

Probability cutoff for selecting benchmark genes = 0.5

xgboost - modelling z-scores - benchmarks from 4 data base, run xgboost jointly

FDR cutoff for selecting benchmark genes = 0.05

FDR cutoff for selecting benchmark genes = 0.1

FDR cutoff for selecting benchmark genes = 0.2

xgboost - modelling binary y - benchmarks from all data base, run xgboost jointly

Probability cutoff for selecting benchmark genes = 0.8

Probability cutoff for selecting benchmark genes = 0.7

Probability cutoff for selecting benchmark genes = 0.6

Probability cutoff for selecting benchmark genes = 0.5

xgboost - modelling z-scores - benchmarks from GOBP

FDR cutoff for selecting benchmark genes = 0.05

FDR cutoff for selecting benchmark genes = 0.1

FDR cutoff for selecting benchmark genes = 0.2

xgboost - modelling binary y - benchmarks from GOBP

Probability cutoff for selecting benchmark genes = 0.8

Probability cutoff for selecting benchmark genes = 0.7

Probability cutoff for selecting benchmark genes = 0.6

Probability cutoff for selecting benchmark genes = 0.5

xgboost - modelling binary y - train model on top 500 genes with lowest magma fdr

Probability cutoff for selecting benchmark genes = 0.8

Probability cutoff for selecting benchmark genes = 0.7

Probability cutoff for selecting benchmark genes = 0.6

Probability cutoff for selecting benchmark genes = 0.5

xgboost - modelling binary y - train model on top 1000 genes with lowest magma fdr

Probability cutoff for selecting benchmark genes = 0.8

Probability cutoff for selecting benchmark genes = 0.7

Probability cutoff for selecting benchmark genes = 0.6

Probability cutoff for selecting benchmark genes = 0.5

Benchmark gene comparison

Modelling z-scores

FDR cutoff for selecting benchmark genes = 0.05

FDR cutoff for selecting benchmark genes = 0.1

FDR cutoff for selecting benchmark genes = 0.2

Modelling binarized y

Probability cutoff for selecting benchmark genes = 0.8

Probability cutoff for selecting benchmark genes = 0.7

Probability cutoff for selecting benchmark genes = 0.6

Probability cutoff for selecting benchmark genes = 0.5