Publications

Structural variants involved in high-altitude adaptation detected using single-molecule long-read sequencing

Published in Nature Communications, 2023

Structural variants (SVs), accounting for a larger fraction of the genome than SNPs/InDels, are an important pool of genetic variation, enabling environmental adaptations. Here, we perform long-read sequencing data of 320 Tibetan and Han samples and show that SVs are highly involved in high-altitude adaptation. We expand the landscape of global SVs, apply robust models of selection and population differentiation combining SVs, SNPs and InDels, and use epigenomic analyses to predict enhancers, target genes and biological functions. We reveal diverse Tibetan-specific SVs affecting the regulatory circuitry of biological functions, including the hypoxia response, energy metabolism and pulmonary function. We find a Tibetan-specific deletion disrupts a super-enhancer and downregulates EPAS1 using enhancer reporter, cellular knock-out and DNA pull-down assays. Our study expands the global SV landscape, reveals the role of gene-regulatory circuitry rewiring in human adaptation, and illustrates the diverse functional roles of SVs in human biology.

Recommended citation: Shi, J., Jia, Z., Sun, J. et al. Structural variants involved in high-altitude adaptation detected using single-molecule long-read sequencing. Nat Commun 14, 8282 (2023). https://doi.org/10.1038/s41467-023-44034-z http://zhilongjia.github.io/files/2023-ONT_tibetan.pdf

Low-dose of caffeine alleviates high altitude pulmonary edema via regulating mitochondrial quality control process in AT1 cells

Published in Frontiers in Pharmacology, 2023

Backgrounds: High-altitude pulmonary edema (HAPE) is a life-threatening disease without effective drugs. Caffeine is a small molecule compound with antioxidant biological activity used to treat respiratory distress syndrome. However, it is unclear whether caffeine plays a role in alleviating HAPE. Methods: We combined a series of biological experiments and label-free quantitative proteomics analysis to detect the effect of caffeine on treating HAPE and explore its mechanism in vivo and in vitro. Results: Dry and wet weight ratio and HE staining of pulmonary tissues showed that the HAPE model was constructed successfully, and caffeine relieved pulmonary edema. The proteomic results of mice lungs indicated that regulating mitochondria might be the mechanism by which caffeine reduced HAPE. We found that caffeine blocked the reduction of ATP production and oxygen consumption rate, decreased ROS accumulation, and stabilized mitochondrial membrane potential to protect AT1 cells from oxidative stress damage under hypoxia. Caffeine promoted the PINK1/parkin-dependent mitophagy and enhanced mitochondrial fission to maintain the mitochondria quality control process. Conclusion: Low-dose of caffeine alleviated HAPE by promoting PINK1/parkin-dependent mitophagy and mitochondrial fission to control the mitochondria quality. Therefore, caffeine could be a potential treatment for HAPE.

Recommended citation: Tian, L., Jia, Z., Yan, Y., Jia, Q., Shi, W., Cui, S., Chen, H., Han, Y., Zhao, X., & He, K. (2023). Low-dose of caffeine alleviates high altitude pulmonary edema via regulating mitochondrial quality control process in AT1 cells. Frontiers in pharmacology, 14, 1155414. https://doi.org/10.3389/fphar.2023.1155414 http://zhilongjia.github.io/files/2023_caffeine.pdf

Proteomic and clinical biomarkers for acute mountain sickness in a longitudinal cohort

Published in Communications Biology, 2022

Ascending to high-altitude by non-high-altitude natives is a well-suited model for studying acclimatization to extreme environments. Acute mountain sickness (AMS) is frequently experienced by visitors. The diagnosis of AMS mainly depends on a self-questionnaire, revealing the need for reliable biomarkers for AMS. Here, we profiled 22 AMS symptom phenotypes, 65 clinical indexes, and plasma proteomic profiles of AMS via a combination of proximity extension assay and multiple reaction monitoring of a longitudinal cohort of 53 individuals. We quantified 1069 proteins and validated 102 proteins. Via differential analysis, machine learning, and functional association analyses. We found and validated that RET played an important role in the pathogenesis of AMS. With high-accuracies (AUCs > 0.9) of XGBoost-based models, we prioritized ADAM15, PHGDH, and TRAF2 as protective, predictive, and diagnostic biomarkers, respectively. Our findings shed light on the precision medicine for AMS and the understanding of acclimatization to high-altitude environments.

Recommended citation: Yang, J., Jia, Z., Song, X. et al. Proteomic and clinical biomarkers for acute mountain sickness in a longitudinal cohort. Commun Biol 5, 548 (2022). https://doi.org/10.1038/s42003-022-03514-6 http://zhilongjia.github.io/files/2022_AMS_proteomic_biomarker.pdf

The active lung microbiota landscape of COVID-19 patients through the metatranscriptome data analysis

Published in BioImpacts, 2021

Introduction: With the outbreak of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the interaction between the host and SARS-CoV-2 was widely studied. However, it is unclear whether and how SARS-CoV-2 infection affects lung microflora, which contribute to COVID-19 complications. Methods: Here, we analyzed the metatranscriptomic data of bronchoalveolar lavage fluid (BALF) of 19 COVID-19 patients and 23 healthy controls from 6 independent projects and detailed the active microbiota landscape in both healthy individuals and COVID-19 patients. Results: The infection of SARS-CoV-2 could deeply change the lung microbiota, evidenced by the α-diversity, β-diversity, and species composition analysis based on bacterial microbiota and virome. Pathogens (e.g., Klebsiella oxytoca causing pneumonia as well), immunomodulatory probiotics (e.g., lactic acid bacteria and Faecalibacterium prausnitzii, a butyrate producer), and Tobacco mosaic virus (TMV) were enriched in the COVID-19 group, suggesting a severe microbiota dysbiosis. The significant correlation between Rothia mucilaginosa, TMV, and SARS-CoV-2 revealed drastic inflammatory battles between the host, SARS-CoV-2, and other microbes in the lungs. Notably, TMV only existed in the COVID-19 group, while human respirovirus 3 (HRV 3) only existed in the healthy group. Our study provides insights into the active microbiota in the lungs of COVID-19 patients and would contribute to the understanding of the infection mechanism of SARS-CoV-2 and the treatment of the disease and complications. Conclusion: SARS-COV-2 infection deeply altered the lung microbiota of COVID-19 patients. The enrichment of several other pathogens, immunomodulatory probiotics (lactic acid or butyrate producers), and TMV in the COVID-19 group suggests a complex and active lung microbiota disorder.

Recommended citation: Han Y, Jia Z, Shi J, Wang W, He K. 2021. The active lung microbiota landscape of COVID-19 patients through the metatranscriptome data analysis. BioImpacts 10.34172/bi.2021.23378 http://zhilongjia.github.io/files/2021_lung_microbiota_COVID19.pdf

Transcriptional landscape in rat intestines under hypobaric hypoxia

Published in Pathogens and Disease, 2020

Oxygen metabolism is closely related to the intestinal homeostasis environment, and the occurrence of many intestinal diseases is as a result of the destruction of oxygen gradients. The hypobaric hypoxic environment of the plateau can cause dysfunction of the intestine for humans, such as inflammation. The compensatory response of the small intestine cells to the harsh environment definitely changes their gene expression. How the small intestine cells response the hypobaric hypoxic environment is still unclear. We studied the rat small intestine under hypobaric hypoxic conditions to explore the transcriptional changes in rats under acute/chronic hypobaric hypoxic conditions. We randomly divided rats into three groups: normal control group (S), acute hypobaric hypoxia group, exposing to hypobaric hypoxic condition for 2 weeks (W2S) and chronic hypobaric hypoxia group, exposing to hypobaric hypoxic condition for 4 weeks (W4S). The RNA sequencing was performed on the small intestine tissues of the three groups of rats. The results of principal component analysis showed that the W4S and W2S groups were quite different from the control group. We identified a total of 636 differentially expressed genes, such as ATP binding cassette, Ace2 and Fabp. KEGG pathway analysis identified several metabolic and digestive pathways, such as PPAR signaling pathway, glycerolipid metabolism, fat metabolism, mineral absorption and vitamin metabolism. Cogena analysis found that up-regulation of digestive and metabolic functions began from the second week of high altitude exposure. Our study highlights the critical role of metabolic and digestive pathways of the intestine in response to the hypobaric hypoxic environment, provides new aspects for the molecular effects of hypobaric hypoxic environment on intestine, and raises further questions about between the lipid metabolism disorders and inflammation.

Recommended citation: Tian L, Jia Z, Xu Z, Shi J, Zhao X, He K. 2021. Transcriptional landscape in rat intestines under hypobaric hypoxia. PeerJ 9:e11823 http://zhilongjia.github.io/files/2020_drpCOVID19.pdf

Transcriptome-based drug repositioning for coronavirus disease 2019 (COVID-19)

Published in Pathogens and Disease, 2020

The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) around the world has led to a pandemic with high morbidity and mortality. However, there are no effective drugs to prevent and treat the disease. Transcriptome-based drug repositioning, identifying new indications for old drugs, is a powerful tool for drug development. Using bronchoalveolar lavage fluid transcriptome data of COVID-19 patients, we found that the endocytosis and lysosome pathways are highly involved in the disease and that the regulation of genes involved in neutrophil degranulation was disrupted, suggesting an intense battle between SARS-CoV-2 and humans. Furthermore, we implemented a coexpression drug repositioning analysis, cogena, and identified two antiviral drugs (saquinavir and ribavirin) and several other candidate drugs (such as dinoprost, dipivefrine, dexamethasone and (-)-isoprenaline). Notably, the two antiviral drugs have also previously been identified using molecular docking methods, and ribavirin is a recommended drug in the diagnosis and treatment protocol for COVID pneumonia (trial version 5-7) published by the National Health Commission of the P.R. of China. Our study demonstrates the value of the cogena-based drug repositioning method for emerging infectious diseases, improves our understanding of SARS-CoV-2-induced disease, and provides potential drugs for the prevention and treatment of COVID-19 pneumonia.

Recommended citation: Jia, Z., Song, X., Shi, J., Wang, W., & He, K. (2020). Transcriptome-based drug repositioning for coronavirus disease 2019 (COVID-19). Volume 78, Issue 4, June 2020, ftaa036, Pathogens and Disease. http://zhilongjia.github.io/files/2020_drpCOVID19.pdf

Heightened innate immune responses in the respiratory tract of COVID-19 patients

Published in Cell Host & Microbe, 2020

The outbreaks of 2019 novel coronavirus disease (COVID-19) caused by SARS-CoV-2 infection have posed a severe threat to global public health. It is unclear how the human immune system responds to this infection. Here, we used metatranscriptomic sequencing to profile immune signatures in the bronchoalveolar lavage fluid of eight COVID-19 cases. The expression of proinflammatory genes, especially chemokines, was markedly elevated in COVID-19 cases compared to community-acquired pneumonia patients and healthy controls, suggesting that SARS-CoV-2 infection causes hypercytokinemia. Compared to SARS-CoV, which is thought to induce inadequate interferon (IFN) responses, SARS-CoV-2 robustly triggered expression of numerous IFN-stimulated genes (ISGs). These ISGs exhibit immunopathogenic potential, with overrepresentation of genes involved in inflammation. The transcriptome data was also used to estimate immune cell populations, revealing increases in activated dendritic cells and neutrophils. Collectively, these host responses to SARS-CoV-2 infection could further our understanding of disease pathogenesis and point toward antiviral strategies.

Recommended citation: Zhou, Zhuo, Lili Ren, Li Zhang, Jiaxin Zhong, Yan Xiao, Zhilong Jia, Li Guo et al. "Heightened innate immune responses in the respiratory tract of COVID-19 patients." Cell Host & Microbe (2020). http://zhilongjia.github.io/files/2020_COVID19.pdf

Impacts of the plateau environment on the gut microbiota and blood clinical indexes in Han and Tibetan individuals

Published in mSystems, 2020

The intestinal microbiota is significantly affected by the external environment, but our understanding of the effects of extreme environments such as plateaus is far from adequate. In this study, we systematically analyzed the variation in the intestinal microbiota and 76 blood clinical indexes among 393 healthy adults with different plateau living durations (Han individuals with no plateau living, with plateau living for 4 to 6 days, with plateau living for >3 months, and who returned to the plain for 3 months, as well as plateau-living Tibetans). The results showed that the high-altitude environment rapidly (4 days) and continually (more than 3 months) shaped both the intestinal microbiota and clinical indexes of the Han population. With prolongation of plateau living, the general characteristics of the intestinal microbiota and clinical indexes of the Han population were increasingly similar to those of the Tibetan population. The intestinal microbiota of the Han population that returned to the plain area for 3 months still resembled that of the plateau-living Han population rather than that of the Han population on the plain. Moreover, clinical indexes such as blood glucose were significantly lower in the plateau groups than in the nonplateau groups, while the opposite result was obtained for testosterone. Interestingly, there were Tibetan-specific correlations between glucose levels and Succinivibrio and Sarcina abundance in the intestine. The results of this study suggest that a hypoxic environment could rapidly and lastingly affect both the human intestinal microbiota and blood clinical indexes, providing new insights for the study of plateau adaptability.

Recommended citation: Jia Z, Zhao X, Liu X, Zhao L, Jia Q, Shi J, Xu X, Hao L, Xu Z, Zhong Q, Yu K, Cui S, Chen H, Guo J, Li X, Han Y, Song X, Zhao C, Bo X, Tian Y, Wang W, Xie G, Feng Q, He K. 2020. Impacts of the plateau environment on the gut microbiota and blood clinical indexes in Han and Tibetan individuals. mSystems 5:e00660-19. https://doi.org/10.1128/mSystems.00660-19. http://zhilongjia.github.io/files/2020_highaltitudemicrobiota.pdf

Fusobacterium nucleatum Facilitates Apoptosis, ROS Generation, and Inflammatory Cytokine Production by Activating AKT/MAPK and NF-kappa B Signaling Pathways in Human Gingival Fibroblasts.

Published in Oxidative Medicine and Cellular Longevity, 2019

Fusobacterium nucleatum (F. nucleatum) plays key roles in the initiation and progression of periodontitis. However, the pathogenic effect of F. nucleatum on human oral tissues and cells has not been fully evaluated. In this study, we aimed to analyze the pathogenic effects of F. nucleatum on human gingival fibroblasts (GFs) and clarify the potential mechanisms. RNA-sequencing analysis confirmed that F. nucleatum significantly altered the gene expression of GF as the stimulation time increased. Cell counting and EdU-labeling assays indicated that F. nucleatum inhibited GF proliferation and promoted cell apoptosis in a time- and dose-dependent manner. In addition, cell apoptosis, intracellular reactive oxygen species (ROS) generation, and proinflammatory cytokine production were dramatically elevated after F. nucleatum stimulation. Furthermore, we found that the AKT/MAPK and NF-κB signaling pathways were significantly activated by F. nucleatum infection and that a large number of genes related to cellular proliferation, apoptosis, ROS, and inflammatory cytokine production downstream of AKT/MAPK and NF-κB signaling pathways were significantly altered in F. nucleatum-stimulated GFs. These findings suggest that F. nucleatum inhibits GF proliferation and promotes cell apoptosis, ROS generation, and inflammatory cytokine production partly by activating the AKT/MAPK and NF-κB signaling pathways. Our study opens a new window for understanding the pathogenic effects of periodontal pathogens on the host oral system.

Recommended citation: Kang, W., Jia, Z., Tang, D., Zhang, Z., Gao, H., He, K., & Feng, Q. (2019). Fusobacterium nucleatum Facilitates Apoptosis, ROS Generation, and Inflammatory Cytokine Production by Activating AKT/MAPK and NF-kappa B Signaling Pathways in Human Gingival Fibroblasts. Oxidative Medicine and Cellular Longevity, 2019, 1681972. http://zhilongjia.github.io/files/2019_drpFN.pdf

Time-Course Transcriptome Analysis for Drug Repositioning in Fusobacterium nucleatum-Infected Human Gingival Fibroblasts.

Published in Frontiers in Cell and Developmental Biology, 2019

Fusobacterium nucleatum (F. nucleatum) is a crucial periodontal pathogen and human gingival fibroblasts (GFs) are the first line of defense against oral pathogens. However, the research on potential molecular mechanisms of host defense and effective treatment of F. nucleatum infection in GFs remains scarce. In this study, we undertook a time-series experiment and performed an RNA-seq analysis to explore gene expression profiles during the process of F. nucleatum infection in GFs. Differentially expressed genes (DEGs) could be divided into three coexpression clusters. Functional analysis revealed that the immune-related signaling pathways were more overrepresented at the early stage, while metabolic pathways were mainly enriched at the late stage. We computationally identified several U.S. Food and Drug Administration (FDA)-approved drugs that could protect the F. nucleatum infected GFs via a coexpression-based drug repositioning approach. Biologically, we confirmed that six drugs (etravirine, zalcitabine, wortmannin, calcium D-pantothenate, ellipticine, and tanespimycin) could significantly decrease F. nucleatum-induced reactive oxygen species (ROS) generation and block the Protein Kinase B (PKB/AKT)/mitogen-activated protein kinase signaling pathways. Our study provides more detailed molecular mechanisms of the process by which F. nucleatum infects GFs and illustrates the value of the cogena-based drug repositioning method and the potential therapeutic application of these tested drugs in the treatment of F. nucleatum infection.

Recommended citation: Kang, W., Jia, Z., Tang, D., Zhao, X., Shi, J., Jia, Q., …& Feng, Q. (2019). Time-Course Transcriptome Analysis for Drug Repositioning in Fusobacterium nucleatum-Infected Human Gingival Fibroblasts. Frontiers in Cell and Developmental Biology, 7, UNSP 204. http://zhilongjia.github.io/files/2019_drpFN.pdf

Transcriptional profiling in the livers of rats after hypobaric hypoxia exposure

Published in PeerJ, 2019

Ascent to high altitude feels uncomfortable in part because of a decreased partial pressure of oxygen due to the decrease in barometric pressure. The molecular mechanisms causing injury in liver tissue after exposure to a hypoxic environment are widely unknown. The liver must physiologically and metabolically change to improve tolerance to altitude-induced hypoxia. Since the liver is the largest metabolic organ and regulates many physiological and metabolic processes, it plays an important part in high altitude adaptation. The cellular response to hypoxia results in changes in the gene expression profile. The present study explores these changes in a rat model. To comprehensively investigate the gene expression and physiological changes under hypobaric hypoxia, we used genome-wide transcription profiling. Little is known about the genome-wide transcriptional response to acute and chronic hypobaric hypoxia in the livers of rats. In this study, we carried out RNA-Sequencing (RNA-Seq) of liver tissue from rats in three groups, normal control rats (L), rats exposed to acute hypobaric hypoxia for 2 weeks (W2L) and rats chronically exposed to hypobaric hypoxia for 4 weeks (W4L), to explore the transcriptional profile of acute and chronic mountain sickness in a mammal under a controlled time-course. We identified 497 differentially expressed genes between the three groups. A principal component analysis revealed large differences between the acute and chronic hypobaric hypoxia groups compared with the control group. Several immune-related and metabolic pathways, such as cytokine-cytokine receptor interaction and galactose metabolism, were highly enriched in the KEGG pathway analysis. Similar results were found in the Gene Ontology analysis. Cogena analysis showed that the immune-related pathways were mainly upregulated and enriched in the acute hypobaric hypoxia group.

Recommended citation: Xu, Z., Jia, Z., Shi, J., Zhang, Z., Gao, X., Jia, Q., …& He, K. (2019). Transcriptional profiling in the livers of rats after hypobaric hypoxia exposure. PeerJ, 7, e6499. http://zhilongjia.github.io/files/2019_AMS.pdf

The biological impact of blood pressure-associated genetic variants in the natriuretic peptide receptor C gene on human vascular smooth muscle

Published in Human Molecular Genetics, 2017

Elevated blood pressure (BP) is a major global risk factor for cardiovascular disease. Genome-wide association studies have identified several genetic variants at the NPR3 locus associated with BP, but the functional impact of these variants remains to be determined. Here we confirmed, by a genome-wide association study within UK Biobank, the existence of two independent BP-related signals within NPR3 locus. Using human primary vascular smooth muscle cells (VSMCs) and endothelial cells (ECs) from different individuals, we found that the BP-elevating alleles within one linkage disequilibrium block identified by the sentinel variant rs1173771 was associated with lower endogenous NPR3 mRNA and protein levels in VSMCs, together with reduced levels in open chromatin and nuclear protein binding. The BP-elevating alleles also increased VSMC proliferation, angiotensin II-induced calcium flux and cell contraction. However, an analogous genotype-dependent association was not observed in vascular ECs. Our study identifies novel, putative mechanisms for BP-associated variants at the NPR3 locus to elevate BP, further strengthening the case for targeting NPR-C as a therapeutic approach for hypertension and cardiovascular disease prevention.

Recommended citation: Ren, M., Ng, F.L., Warren, H.R., Witkowska, K., Baron, M., Jia, Z., …& Caulfield, M.J. (2018). The biological impact of blood pressure-associated genetic variants in the natriuretic peptide receptor C gene on human vascular smooth muscle.. Hum Mol Genet, 27(1), 199-210. http://zhilongjia.github.io/files/2017_BP.pdf

High Expression of CPT1A Predicts Adverse Outcomes: A Potential Therapeutic Target for Acute Myeloid Leukemia

Published in EBioMedicine, 2016

Identification of prognostic biomarkers is essential for therapeutic choice of AML. This study represents direct evidences that high expression of CPT1A is significantly associated with poor outcomes and abnormal genomic and epigenomic patterns in AML patients. CPT1A is an important catalyzer for fatty-acid oxidation pathway, which may provide alternative carbon source for leukemia proliferation. Findings of this study may indicate the significance of fat metabolism in leukemogenesis.

Recommended citation: Shi, J., Fu, H., Jia, Z., He, K., Fu, L., & Wang, W. (2016). High Expression of CPT1A Predicts Adverse Outcomes: A Potential Therapeutic Target for Acute Myeloid Leukemia. EBioMedicine, 14, 55-64. http://zhilongjia.github.io/files/2016_AML.pdf

Cogena, a novel tool for co-expressed gene-set enrichment analysis, applied to drug repositioning and drug mode of action discovery

Published in BMC Genomics, 2016

Drug repositioning, finding new indications for existing drugs, has gained much recent attention as a potentially efficient and economical strategy for accelerating new therapies into the clinic. Although improvement in the sensitivity of computational drug repositioning methods has identified numerous credible repositioning opportunities, few have been progressed. Arguably the “black box” nature of drug action in a new indication is one of the main blocks to progression, highlighting the need for methods that inform on the broader target mechanism in the disease context. We demonstrate that the analysis of co-expressed genes may be a critical first step towards illumination of both disease pathology and mode of drug action. We achieve this using a novel framework, co-expressed gene-set enrichment analysis (cogena) for co-expression analysis of gene expression signatures and gene set enrichment analysis of co-expressed genes. The cogena framework enables simultaneous, pathway driven, disease and drug repositioning analysis. Cogena can be used to illuminate coordinated changes within disease transcriptomes and identify drugs acting mechanistically within this framework. We illustrate this using a psoriatic skin transcriptome, as an exemplar, and recover two widely used Psoriasis drugs (Methotrexate and Ciclosporin) with distinct modes of action. Cogena out-performs the results of Connectivity Map and NFFinder webservers in similar disease transcriptome analyses. Furthermore, we investigated the literature support for the other top-ranked compounds to treat psoriasis and showed how the outputs of cogena analysis can contribute new insight to support the progression of drugs into the clinic. We have made cogena freely available within Bioconductor or https://github.com/zhilongjia/cogena. In conclusion, by targeting co-expressed genes within disease transcriptomes, cogena offers novel biological insight, which can be effectively harnessed for drug discovery and repositioning, allowing the grouping and prioritisation of drug repositioning candidates on the basis of putative mode of action.

Recommended citation: Jia, Z., Liu, Y., Guan, N., Bo, X., Luo, Z., & Barnes, M.R. (2016). Cogena, a novel tool for co-expressed gene-set enrichment analysis, applied to drug repositioning and drug mode of action discovery. BMC Genomics, 17, 414. http://zhilongjia.github.io/files/2016_cogena.pdf

Health and population effects of rare gene knockouts in adult humans with related parents

Published in Science, 2016

Examining complete gene knockouts within a viable organism can inform on gene function. We sequenced the exomes of 3222 British adults of Pakistani heritage with high parental relatedness, discovering 1111 rare-variant homozygous genotypes with predicted loss of function (knockouts) in 781 genes. We observed 13.7% fewer homozygous knockout genotypes than we expected, implying an average load of 1.6 recessive-lethal-equivalent loss-of-function (LOF) variants per adult. When genetic data were linked to the individuals’ lifelong health records, we observed no significant relationship between gene knockouts and clinical consultation or prescription rate. In this data set, we identified a healthy PRDM9-knockout mother and performed phased genome sequencing on her, her child, and control individuals. Our results show that meiotic recombination sites are localized away from PRDM9-dependent hotspots. Thus, natural LOF variants inform on essential genetic loci and demonstrate PRDM9 redundancy in humans.

Recommended citation: Narasimhan, V.M., Hunt, K.A., Mason, D., Baker, C.L., Karczewski, K.J., Barnes, M.E.R., …& van, H.D.A. (2016). Health and population effects of rare gene knockouts in adult humans with related parents. Science, 352(6284), 474-477. http://zhilongjia.github.io/files/2016_lof.pdf

Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification

Published in PLos One, 2015

Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.

Recommended citation: Zhang, X., Guan, N., Jia, Z., Qiu, X., & Luo, Z. (2015). Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification. PLos One, 10(9), e0138814. http://zhilongjia.github.io/files/2015_semiPNMF.pdf

Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization

Published in PLos One, 2015

RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

Recommended citation: Jia, Z., Zhang, X., Guan, N., Bo, X., Barnes, M.R., & Luo, Z. (2015). Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization. PLos One, 10(9), e0137782. http://zhilongjia.github.io/files/2015_DNMF.pdf

Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization

Published in PLos One, 2015

RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

Recommended citation: Jia, Z., Zhang, X., Guan, N., Bo, X., Barnes, M.R., & Luo, Z. (2015). Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization. PLos One, 10(9), e0137782. http://academicpages.github.io/files/DNMF.pdf

Application of ribosome profiling

Published in Progress in Biochemistry and Biophysics, 2013

The ribosome profiling, based on deep sequencing of the ribosome-protected mRNA fragments, makes it possible to deeply and precisely study genome-wide protein translation process and translation regulation. This review introduces the development, construction and analysis of the ribosome profiling, and focuses on its wide applications. These applications include discovering novel ORFs (such as uORF and sORF), deciphering the mechanism of microRNA functional roles, charactering the translation efficiency and differential translation, predicting the protein abundance and illustrating the mechanism of protein translation. Finally, we analyze the broad development prospect of ribosome profiling. http://www.pibb.ac.cn/pibbcn/ch/reader/view_abstract.aspx?file_no=20120395&flag=1

Recommended citation: Jia, Z.L., Qu, W.B., Lu, Y.M., Luo, Z.G., & Zhang, C.G. (2013). Application of ribosome profiling 核糖体谱技术及其应用. Progress in Biochemistry and Biophysics, 40(1), 30-32. http://zhilongjia.github.io/files/2013_ribosomeprofiling.pdf

A genetically synthetic protein-based cationic polymer for siRNA delivery

Published in Medical Hypotheses, 2011

In recent years, a large number of researchers have paid much attention on small interfering RNA (siRNA) after the advent of RNA interference technology, which has been harnessed as an efficient way of sequence-specific gene silencing in gene therapy, enables elucidation of gene functions, and the identification of new drug targets. Despite tremendous progress has been made in novel delivery systems and vectors via formulation of polyplexes and conjugations, such as cationic polymers (LPEI, BPEI), cationic liposome (DOTAP), peptides (CPP), unmet needs still exist. Many cationic agents used for condensing siRNA often exhibits severe cytotoxicity, which limits clinical applications, and is obliged to be handled. Thus great interest in searching for novel and sophisticated polymeric vectors has been spurred. Herein we proposed a genetically synthetic protein-based polymer, which is also referred to as elastin-like polypeptides (ELPs) excerpted from human tropoelastin highly repetitive sequence, Val-Pro-Gly-Xaa-Gly, where the “guest residue” Xaa is any amino acid except Pro. Thus, if we alternate the “guest residue” Xaa to Lys or Arg, to a significant extent, it can emerge as a powerful cationic polymer for siRNA delivery carrier, and hopefully it will be put into practice in the near future.

Recommended citation: Liu, Y., Jia, Z., Li, L., & Chen, F. (2011). A genetically synthetic protein-based cationic polymer for siRNA delivery. Medical Hypotheses, 76(2), 239-240. http://zhilongjia.github.io/files/2011_ELP.pdf