Last updated: 2024-10-08

Checks: 6 1

Knit directory: multigroup_ctwas_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20231112) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f65699e. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  analysis/matching_tissue_v2_preL.Rmd

Unstaged changes:
    Modified:   analysis/index.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


library(lattice)
library(gridExtra)
library(ggplot2)

num_top <- 10

load("/project/xinhe/xsun/multi_group_ctwas/10.single_tissue_1007/summary/para_pip08.rdata")
sum_all <- para_sum
colnames(sum_all)[ncol(sum_all)] <-"PVE(%)"
sum_all$`PVE(%)` <- as.numeric(sum_all$`PVE(%)`)*100


get_correlation <- function(tissue1, tissue2, cor_matrix) {
  # Check if tissues are in the matrix and find their positions
  pos1 <- match(tissue1, colnames(cor_matrix))
  pos2 <- match(tissue2, rownames(cor_matrix))
  
  return(cor_matrix[pos2, pos1]) 
}

r2cutoff <- 0.8
filter_tissues <- function(tissue_list, cor_matrix) {
  filtered_list <- c(tissue_list[1]) # Start with the first tissue
  
  for (i in 2:length(tissue_list)) {
    high_correlation_found <- FALSE
    for (j in 1:length(filtered_list)) {
      if (!is.na(get_correlation(filtered_list[j], tissue_list[i], cor_matrix)) &&
          get_correlation(filtered_list[j], tissue_list[i], cor_matrix) > r2cutoff) {
        high_correlation_found <- TRUE
        break # Break the inner loop as high correlation is found
      }
    }
    
    if (!high_correlation_found) {
      filtered_list <- c(filtered_list, tissue_list[i])
    }
  }
  return(filtered_list)
}

fill_upper_triangle <- function(cor_matrix) {
    for (i in 1:nrow(cor_matrix)) {
        for (j in 1:ncol(cor_matrix)) {
            # Check if the upper triangle element is NA and the lower triangle element is not NA
            if (is.na(cor_matrix[i, j]) && !is.na(cor_matrix[j, i])) {
                # Copy the value from the lower triangle to the upper triangle
                cor_matrix[i, j] <- cor_matrix[j, i]
            }
        }
    }
    cor_matrix[lower.tri(cor_matrix)] <- NA
    return(cor_matrix)
}

combine_vectors <- function(vec1, vec_cor1, vec2, vec_cor2, pad_with_na = TRUE) {
  # Check if padding with NAs is required
  if (pad_with_na) {
    # Pad the shorter primary vector and its corresponding vector with NAs
    if (length(vec1) < length(vec2)) {
      extra_length <- length(vec2) - length(vec1)
      vec1 <- c(vec1, rep(NA, extra_length))
      vec_cor1 <- c(vec_cor1, rep(NA, extra_length))
    } else {
      extra_length <- length(vec1) - length(vec2)
      vec2 <- c(vec2, rep(NA, extra_length))
      vec_cor2 <- c(vec_cor2, rep(NA, extra_length))
    }
    # Combine the vectors into a matrix/data frame
    return(cbind(vec1, vec_cor1, vec2, vec_cor2))
  } else {
    # Combine the vectors into a list
    return(list(vec1=vec1, vec_cor1=vec_cor1, vec2=vec2, vec_cor2=vec_cor2))
  }
}

Analysis settings

parameters

  1. Weight processing:

PredictDB eQTLs

  • drop_strand_ambig = TRUE,
  • scale_by_ld_variance = T
  • load_predictdb_LD = F,
  1. Parameter estimation and fine-mapping
  • group_prior_var_structure = “shared_type”,
  • filter_L = TRUE,
  • filter_nonSNP_PIP = FALSE,
  • min_nonSNP_PIP = 0.5,
  • min_abs_corr = 0.1,

(min_gene = 0 in parameter estimation, min_gene = 1 in screening region)

mem: 100g 10cores

Traits

aFib, IBD, LDL, SBP, SCZ, WBC

details

Weights

PredictDB, 32 tissues with sample size (RNASeq.and.Genotyped.samples below) >= 200

weights <- read.csv("/project/xinhe/xsun/ctwas/1.matching_tissue/results_data/gtex_samplesize.csv")
DT::datatable(weights,options = list(pageLength = 5))
num_top <- 10

Tissue-tissue correlation data

From MASH paper, PMID30478440

The MASH paper provides a matrix indicating the shared magnitude of eQTLs among tissues by computing the proportion of effects significant in either tissue that are within 2-fold magnitude of one another.

data download link (MASH matrix)

load("/project/xinhe/xsun/ctwas/1.matching_tissue/data/tissue_cor.rdata")
clrs <- colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
                               "#E0F3F8","#91BFDB","#4575B4")))(64)

We filtered the correlation using 0.8 as cutoff.

aFib-ebi-a-GCST006414

sum_cat <- sum_all[sum_all$trait =="aFib-ebi-a-GCST006414",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )
tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

IBD-ebi-a-GCST004131

sum_cat <- sum_all[sum_all$trait =="IBD-ebi-a-GCST004131",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )
tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

LDL-ukb-d-30780_irnt

sum_cat <- sum_all[sum_all$trait =="LDL-ukb-d-30780_irnt",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )
tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

SBP-ukb-a-360

sum_cat <- sum_all[sum_all$trait =="SBP-ukb-a-360",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )
tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

SCZ-ieu-b-5102

sum_cat <- sum_all[sum_all$trait =="SCZ-ieu-b-5102",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )
tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

WBC-ieu-b-30

sum_cat <- sum_all[sum_all$trait =="WBC-ieu-b-30",]

DT::datatable(sum_cat,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','summary for ctwas parameters and high pip genes (all tissues analyzed)'),options = list(pageLength = 5) )

Filtering based on the number of high PIP genes

num_top <- 15

sum_cat <- sum_cat[order(as.numeric(sum_cat$`#pip>0.8&incs`),decreasing = T),]

tissue_select <- sum_cat$tissue[num_top:1]
tissue_select <- tissue_select[tissue_select%in%rownames(lat)]

##heatmap
cor <- lat[tissue_select,tissue_select]
cor <- fill_upper_triangle(cor)
print(levelplot(cor,col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,main = "heatmap showing the tissue-tissue correlation"))

##cor matrix
cor <- cor[rev(tissue_select),rev(tissue_select)]
DT::datatable(cor,caption = htmltools::tags$caption( style = 'caption-side: topleft; text-align = left; color:black;  font-size:150% ;','tissue-tissue correlation matrix '),options = list(pageLength = 10) )
tissue_select_f1 <- sum_cat$tissue[1:num_top]
filtered_tissues <- filter_tissues(tissue_select_f1, cor)

tissue_select_f1_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% tissue_select_f1 ]
filtered_tissues_cor <- sum_cat$`#pip>0.8&incs`[sum_cat$tissue %in% filtered_tissues ]
comb <- combine_vectors(tissue_select_f1, tissue_select_f1_cor, filtered_tissues, filtered_tissues_cor, pad_with_na = TRUE)
colnames(comb) <- c("without_filtering","#highpip_gene","filtered","#highpip_gene")


DT::datatable(comb,caption = htmltools::tags$caption( style = 'caption-side: left; text-align: left; color:black;  font-size:150% ;','Comparing the top tissue lists (with/without filtering by tissue correlation) '),options = list(pageLength = 10) )

Selected Tissues

Trait Number of Tissues Tissue # of high pip genes (in cs) PVE (if same in number of high pip genes)
aFib 5 Heart_Atrial_Appendage 24
Artery_Tibial 16
Muscle_Skeletal 12
Stomach 9
Thyroid 8
IBD 6 Whole_Blood 11
Adipose_Subcutaneous 10
Cells_Cultured_fibroblasts 9
Spleen 8 0.0304
Testis 8 0.0234
Pituitary 8 0.0209
LDL 6 Liver 31
Spleen 22
Esophagus_Mucosa 20
Esophagus_Gastroesophageal_Junction 19
Skin_Not_Sun_Exposed_Suprapubic 17 0.0063
Adipose_Subcutaneous 17 0.0054
SBP 5 Artery_Tibial 29
Heart_Atrial_Appendage 17
Adipose_Subcutaneous 16
Brain_Cortex 16
Skin_Sun_Exposed_Lower_leg 16
SCZ 6 Brain_Hippocampus 14
Adrenal_Gland 13
Brain_Spinal_cord_cervical_c-1 12
Spleen 10 0.0154
Heart_Left_Ventricle 10 0.0145
Brain_Substantia_nigra 10 0.0132
WBC 6 Whole_Blood 81
Adipose_Subcutaneous 52
Esophagus_Muscularis 45
Cells_Cultured_fibroblasts 44
Thyroid 40 0.0074
Colon_Transverse 40 0.0067

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.5.1   gridExtra_2.3   lattice_0.20-45

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.12       highr_0.9         pillar_1.9.0      compiler_4.2.0   
 [5] bslib_0.3.1       later_1.3.0       jquerylib_0.1.4   git2r_0.30.1     
 [9] workflowr_1.7.0   tools_4.2.0       digest_0.6.29     jsonlite_1.8.0   
[13] evaluate_0.15     lifecycle_1.0.4   tibble_3.2.1      gtable_0.3.0     
[17] pkgconfig_2.0.3   rlang_1.1.2       cli_3.6.1         rstudioapi_0.13  
[21] crosstalk_1.2.0   yaml_2.3.5        xfun_0.41         fastmap_1.1.0    
[25] withr_2.5.0       dplyr_1.1.4       stringr_1.5.1     knitr_1.39       
[29] htmlwidgets_1.5.4 generics_0.1.2    fs_1.5.2          vctrs_0.6.5      
[33] sass_0.4.1        DT_0.22           tidyselect_1.2.0  rprojroot_2.0.3  
[37] grid_4.2.0        glue_1.6.2        R6_2.5.1          fansi_1.0.3      
[41] rmarkdown_2.25    magrittr_2.0.3    scales_1.3.0      promises_1.2.0.1 
[45] htmltools_0.5.2   colorspace_2.0-3  httpuv_1.6.5      utf8_1.2.2       
[49] stringi_1.7.6     munsell_0.5.0