Last updated: 2024-10-08

Checks: 6 1

Knit directory: multigroup_ctwas_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20231112)

The command set.seed(20231112) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: f65699e

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f65699e. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  analysis/matching_tissue_v2_preL.Rmd

Unstaged changes:
    Modified:   analysis/index.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.

library(lattice)
library(gridExtra)
library(ggplot2)

num_top <- 10

load("/project/xinhe/xsun/multi_group_ctwas/10.single_tissue_1007/summary/para_pip08.rdata")
sum_all <- para_sum
colnames(sum_all)[ncol(sum_all)] <-"PVE(%)"
sum_all$`PVE(%)` <- as.numeric(sum_all$`PVE(%)`)*100


get_correlation <- function(tissue1, tissue2, cor_matrix) {
  # Check if tissues are in the matrix and find their positions
  pos1 <- match(tissue1, colnames(cor_matrix))
  pos2 <- match(tissue2, rownames(cor_matrix))
  
  return(cor_matrix[pos2, pos1]) 
}

r2cutoff <- 0.8
filter_tissues <- function(tissue_list, cor_matrix) {
  filtered_list <- c(tissue_list[1]) # Start with the first tissue
  
  for (i in 2:length(tissue_list)) {
    high_correlation_found <- FALSE
    for (j in 1:length(filtered_list)) {
      if (!is.na(get_correlation(filtered_list[j], tissue_list[i], cor_matrix)) &&
          get_correlation(filtered_list[j], tissue_list[i], cor_matrix) > r2cutoff) {
        high_correlation_found <- TRUE
        break # Break the inner loop as high correlation is found
      }
    }
    
    if (!high_correlation_found) {
      filtered_list <- c(filtered_list, tissue_list[i])
    }
  }
  return(filtered_list)
}

fill_upper_triangle <- function(cor_matrix) {
    for (i in 1:nrow(cor_matrix)) {
        for (j in 1:ncol(cor_matrix)) {
            # Check if the upper triangle element is NA and the lower triangle element is not NA
            if (is.na(cor_matrix[i, j]) && !is.na(cor_matrix[j, i])) {
                # Copy the value from the lower triangle to the upper triangle
                cor_matrix[i, j] <- cor_matrix[j, i]
            }
        }
    }
    cor_matrix[lower.tri(cor_matrix)] <- NA
    return(cor_matrix)
}

combine_vectors <- function(vec1, vec_cor1, vec2, vec_cor2, pad_with_na = TRUE) {
  # Check if padding with NAs is required
  if (pad_with_na) {
    # Pad the shorter primary vector and its corresponding vector with NAs
    if (length(vec1) < length(vec2)) {
      extra_length <- length(vec2) - length(vec1)
      vec1 <- c(vec1, rep(NA, extra_length))
      vec_cor1 <- c(vec_cor1, rep(NA, extra_length))
    } else {
      extra_length <- length(vec1) - length(vec2)
      vec2 <- c(vec2, rep(NA, extra_length))
      vec_cor2 <- c(vec_cor2, rep(NA, extra_length))
    }
    # Combine the vectors into a matrix/data frame
    return(cbind(vec1, vec_cor1, vec2, vec_cor2))
  } else {
    # Combine the vectors into a list
    return(list(vec1=vec1, vec_cor1=vec_cor1, vec2=vec2, vec_cor2=vec_cor2))
  }
}

Analysis settings

parameters

Weight processing:

PredictDB eQTLs

drop_strand_ambig = TRUE,
scale_by_ld_variance = T
load_predictdb_LD = F,

Parameter estimation and fine-mapping

group_prior_var_structure = “shared_type”,
filter_L = TRUE,
filter_nonSNP_PIP = FALSE,
min_nonSNP_PIP = 0.5,
min_abs_corr = 0.1,

(min_gene = 0 in parameter estimation, min_gene = 1 in screening region)

mem: 100g 10cores

Traits

aFib, IBD, LDL, SBP, SCZ, WBC

details

Weights

PredictDB, 32 tissues with sample size (RNASeq.and.Genotyped.samples below) >= 200

weights <- read.csv("/project/xinhe/xsun/ctwas/1.matching_tissue/results_data/gtex_samplesize.csv")
DT::datatable(weights,options = list(pageLength = 5))

num_top <- 10

Tissue-tissue correlation data

From MASH paper, PMID30478440

The MASH paper provides a matrix indicating the shared magnitude of eQTLs among tissues by computing the proportion of effects significant in either tissue that are within 2-fold magnitude of one another.

data download link (MASH matrix)

load("/project/xinhe/xsun/ctwas/1.matching_tissue/data/tissue_cor.rdata")
clrs <- colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
                               "#E0F3F8","#91BFDB","#4575B4")))(64)

We filtered the correlation using 0.8 as cutoff.

aFib-ebi-a-GCST006414

sum_cat <- sum_all[sum_all$trait =="aFib-ebi-a-GCST006414",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )

tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

IBD-ebi-a-GCST004131

sum_cat <- sum_all[sum_all$trait =="IBD-ebi-a-GCST004131",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )

tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

LDL-ukb-d-30780_irnt

sum_cat <- sum_all[sum_all$trait =="LDL-ukb-d-30780_irnt",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )

tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

SBP-ukb-a-360

sum_cat <- sum_all[sum_all$trait =="SBP-ukb-a-360",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )

tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

SCZ-ieu-b-5102

sum_cat <- sum_all[sum_all$trait =="SCZ-ieu-b-5102",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )

tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

WBC-ieu-b-30

sum_cat <- sum_all[sum_all$trait =="WBC-ieu-b-30",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

num_top <- 15

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )

tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

Selected Tissues

Trait	Number of Tissues	Tissue	# of high pip genes (in cs)	PVE (if same in number of high pip genes)
aFib	5	Heart_Atrial_Appendage	24
		Artery_Tibial	16
		Muscle_Skeletal	12
		Stomach	9
		Thyroid	8
IBD	6	Whole_Blood	11
		Adipose_Subcutaneous	10
		Cells_Cultured_fibroblasts	9
		Spleen	8	0.0304
		Testis	8	0.0234
		Pituitary	8	0.0209
LDL	6	Liver	31
		Spleen	22
		Esophagus_Mucosa	20
		Esophagus_Gastroesophageal_Junction	19
		Skin_Not_Sun_Exposed_Suprapubic	17	0.0063
		Adipose_Subcutaneous	17	0.0054
SBP	5	Artery_Tibial	29
		Heart_Atrial_Appendage	17
		Adipose_Subcutaneous	16
		Brain_Cortex	16
		Skin_Sun_Exposed_Lower_leg	16
SCZ	6	Brain_Hippocampus	14
		Adrenal_Gland	13
		Brain_Spinal_cord_cervical_c-1	12
		Spleen	10	0.0154
		Heart_Left_Ventricle	10	0.0145
		Brain_Substantia_nigra	10	0.0132
WBC	6	Whole_Blood	81
		Adipose_Subcutaneous	52
		Esophagus_Muscularis	45
		Cells_Cultured_fibroblasts	44
		Thyroid	40	0.0074
		Colon_Transverse	40	0.0067

sessionInfo()

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.5.1   gridExtra_2.3   lattice_0.20-45

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.12       highr_0.9         pillar_1.9.0      compiler_4.2.0   
 [5] bslib_0.3.1       later_1.3.0       jquerylib_0.1.4   git2r_0.30.1     
 [9] workflowr_1.7.0   tools_4.2.0       digest_0.6.29     jsonlite_1.8.0   
[13] evaluate_0.15     lifecycle_1.0.4   tibble_3.2.1      gtable_0.3.0     
[17] pkgconfig_2.0.3   rlang_1.1.2       cli_3.6.1         rstudioapi_0.13  
[21] crosstalk_1.2.0   yaml_2.3.5        xfun_0.41         fastmap_1.1.0    
[25] withr_2.5.0       dplyr_1.1.4       stringr_1.5.1     knitr_1.39       
[29] htmlwidgets_1.5.4 generics_0.1.2    fs_1.5.2          vctrs_0.6.5      
[33] sass_0.4.1        DT_0.22           tidyselect_1.2.0  rprojroot_2.0.3  
[37] grid_4.2.0        glue_1.6.2        R6_2.5.1          fansi_1.0.3      
[41] rmarkdown_2.25    magrittr_2.0.3    scales_1.3.0      promises_1.2.0.1 
[45] htmltools_0.5.2   colorspace_2.0-3  httpuv_1.6.5      utf8_1.2.2       
[49] stringi_1.7.6     munsell_0.5.0

Deciding the matching tissue – pre-estimate L

XSun

2024-10-08

Analysis settings

parameters

Traits

Weights

Tissue-tissue correlation data

aFib-ebi-a-GCST006414

Filtering based on the number of high PIP genes

IBD-ebi-a-GCST004131

Filtering based on the number of high PIP genes

LDL-ukb-d-30780_irnt

Filtering based on the number of high PIP genes

SBP-ukb-a-360

Filtering based on the number of high PIP genes

SCZ-ieu-b-5102

Filtering based on the number of high PIP genes

WBC-ieu-b-30

Filtering based on the number of high PIP genes

Selected Tissues