INFORMATION ABOUT PROJECT,
SUPPORTED BY RUSSIAN SCIENCE FOUNDATION

The information is prepared on the basis of data from the information-analytical system RSF, informative part is represented in the author's edition. All rights belong to the authors, the use or reprinting of materials is permitted only with the prior consent of the authors.

 

COMMON PART


Project Number18-14-00240

Project titleGenome and transcriptome-wide detection of long non-coding RNAs function in epigenetic regulation

Project LeadMedvedeva Yulia

AffiliationFederal State Institution "Federal Research Centre "Fundamentals of Biotechnology" of the Russian Academy of Sciences",

Implementation period 2018 - 2020 

Research area 04 - BIOLOGY AND LIFE SCIENCES, 04-207 - Systems biology; bioinformatics

KeywordsLong non-coding RNAs, DNA methylation, histone modifications, chromatin modifications, RNA-chromatin interactions, lncRNA-DNA triplexes, lncRNA-RNA duplexes


 

PROJECT CONTENT


Annotation
By now, it is widely accepted that the complexity of higher organisms does not correlate with the number of protein-coding genes, rather than with the complexity of regulatory programs. Epigenetic regulation of gene expression --- one of the most important classes of regulatory mechanisms --- has been intensively studied over the past few decades, setting the methodological and conceptual grounds for an integrated understanding of these mechanisms. Nevertheless, for most epigenetic modifications, the mechanisms of how they are set are not known in fine details, it is especially unclear how the enzymes that set these modifications are able to target a specific genomic location. Recently, several examples of lncRNAs targeting epigenetic complexes to specific genomic loci have been reported. High-performance experimental techniques based on genomic sequencing (such as ChiRP-Seq, CHART, MARGI, GRID-seq etc.) demonstrate that a number of lncRNAs interact with multiple chromatin sites, possibly changing their epigenetic state. It has also been shown that the total number of different lncRNAs in the human genome is comparable with the number of protein-coding genes. However, the functions of most of these RNAs are unknown. In this project, we propose to combine epigenetic regulation with regulation by lncRNA with the aim to create a comprehensive map of epigenetic regulation controlled by lncRNA. This project will determine the role and possible mechanism of action for many lncRNAs. The software and database to be developed will be an important source of information on transcription regulation of human through epigenetic mechanisms and lncRNAs. The novelty and originality of the proposed project are reflected in aims and objectives, as well as in the proposed methodological approach. Long non-coding RNAs are a rapidly growing class of RNA with poorly annotated functionality. Collaboration with the international consortium FANTOM6 provides our team with an access to unique experimental data that systematically evaluates the functions of a wide range of lncRNAs and RNA interactions with chromatin. Our team, having a lot of experience in bioinformatic analysis of the RNA-RNA and RNA-DNA interactions, proposes a new approach to model such interactions --- to move from traditional free energy based estimates to complex modeling that incorporates statistical significance of the interaction, which allows for genome and transcriptome-wide comparison of various interactions. In addition, our team is developing an integrated multi-omics pipeline that combines information from several types of genome-wide experiments (RNA-Seq, ChIP-Seq) as well as the results of RNA-RNA and RNA-DNA modeling. Our approach allows improving the quality and reliability of lncRNA target predictions as well as transcriptome-wide analysis of lncRNAs in order to identify those that participate in chromatin regulation. This project will contribute to the solution of cutting-edge fundamental questions about a cross-talk between lncRNAs and epigenetic regulatory mechanisms. It may also contribute to applied research. It is well known that the disruption of epigenetic mechanisms contributes to the development of many diseases, including cancer, metabolic and neurodegenerative disorders. Several epigenetic drugs that affect DNA methylation and histone acetylation have already been used in clinic around the world. However, these drugs are not loci-specific and cannot modify the epigenetic status of a particular gene or specific genomic region, which reduces their effectiveness and produces side effects. On the other hand, experimental methods of genomic editing using CRISPR | Cas or expression reduction with short RNA (siRNA, ASO), allow targeting very specific genomic regions or transcripts in the cell. A detailed understanding of lncRNA functioning and mechanisms how lncRNAs target epigenetic complexes to specific genomic loci, may in the future contribute to the development of new technologies of highly specific epigenetic editing. No doubts, such technologies will promote the progress of biomedical research and, therefore, will make a social impact.

Expected results
Main results of the project: - New methods for statistical estimation of RNA-DNA and RNA-RNA interactions genome- and transcriptome-wide; - An algorithm for genome- and transcriptome-wide search for lncRNA interacting with chromatin and their interaction sites; - Computational pipelines for identification of statistically significant and potentially biologically functional RNA-RNA duplexes and RNA-DNA triplexes, which may be involved in targeting of epigenetic modifiers to specific genomic location; the computational pipelines will incorporate both new statistical methods for the RNA-RNA and RNA-DNA interactions as well as an multi-omics analysis of RNA expression data and data on correlated epigenetic modification; - A collection of lncRNAs potentially involved in targeting of epigenetic modifiers to specific genomic locations, confirmed by public and proprietary data (obtained through a collaboration with the FANTOM6 consortium) on RNA-chromatin and RNA-RNA interactions obtained by the methods of ChIRP-seq, CHART, PARIS or similar; - A web-server with user-friendly interfaces to the developed tools; - A database with a web-interface, representing the lncRNAs and their involvement in targeting epigenetic modifications to specific genomic locations; the database will include information on specific combinations of the lncRNA-genomic locus in which a particular RNA can facilitate the establishment or alteration of the epigenetic profile; - A set of experimentally confirmed lncRNAs that contribute to the local epigenetic state of chromatin; - Comparative-genomic analysis of pairs of lncRNAs-genomic locus for studying the evolution of these interactions The results of the proposed project lay on the cutting edge of epigenetic research and will contribute to our understanding of establishment and maintenance of epigenetic profiles. The role of lncRNA in epigenetic regulation has been demonstrated but so far only for very few examples. The proposed large-scale project will be able to discover many new potential lncRNA candidates involved in epigenetic regulation and facilitate an understanding of the mechanisms. The most promising lncRNAs will be confirmed either in our laboratory or in collaboration with the international consortium FANTOM6. The innovative nature of the proposed objectives, an interdisciplinary approach and the qualification of the team makes the project highly competitive on the international level. The results of the project will be published in leading scientific journals. The developed programs and databases will be used by a broad scientific community. In the future, the tools developed during the project can also contribute to medical research and create the basis for developing well-targeted epigenetic drugs. Disruption of epigenetic mechanisms contributes to the pathogenesis of many complex and socially significant diseases, including cancer, metabolic and neurodegenerative disorders, etc. Although several epigenetic drugs - altering DNA methylation or histone acetylation - are currently used in the clinic, they are not targeted to specific genomic regions, and are also toxic in high doses. At the same time, methods of targeted DNA and RNA editing based on CRISPR | Cas systems, as well as knock-down methods of RNA using miRNA (siRNA) have proven to be highly effective in vivo. Understanding the role of lncRNAs in targeting epigenetic modifiers to specific regions of the genome in combination with bioengineering methods for local editing provides the basis for the development of highly specific epigenetic drugs.


 

REPORTS


Annotation of the results obtained in 2020
In 2020, as part of the project, we completed a pipeline for analyzing the role of long non-coding RNAs in the chromatin regulation. The main improvement made this year is the independent analysis of various ASOs, which allows for more reliable identification of ncRNAs that are involved in epigenetic regulation. Using this pipeline we analyzed ten epigenetic modifications and regions of variable DNA methylation. Based on these results we developed HiMoRNA database (himorna.fbras.ru) containing 5640250 significantly correlated lncRNA-genome regions pairs for 10 histone modifications and 4145 ncRNAs. To characterize each lncRNA-genomic locus pair, we also collected a series of metadata for each interaction. Our pipeline revealed 48 lncRNAs potentially involved in the regulation of at least one epigenetic modification associated with functional changes in gene expression. More than half of these RNAs showed consistent results in biological replicas and were also validated using RNA-Chromatin Interaction Data (iMARGI). We found an extremely non-uniform chromosomal distribution of loci in which epigenetic modification was correlated with lncRNA expression. In particular, our pipeline shows a strong association between FTX and DNA methylation and between JPX and H3K9me3. Both DNA methylation and H3K9me3 are repressive modifiers and are known to contribute to different stages of X inactivation. We have shown that almost all of the JPX-associated H3K9me3 peaks are located on the X chromosome, indicating that JPX can also contribute not only to the regulation of XIST per se, but also directly to the repression of the X chromosome. Investigating the mechanisms of interaction of lncRNAs with target regions, we also showed that about 30% of all knockdown lncRNAs can regulate target genes through the formation of ncRNA: RNA duplexes. Most of these lncRNAs are likely to function in the nucleus, where they can interact with target transcripts during transcription (cotranscriptionally). This observation was also confirmed by experimental data on lncRNA-chromatin interactions. LncRNA CHASERR is most likely to form lncRNA: RNA duplexes with regulated genes co-transcriptionally. We have proposed a model suggesting that a helicase CHD2, in interaction with CHASERR, is guided to the target promoters and activates genes. Conservative regions of this lncRNA were identified, which are most likely involved in the function of CHASERR. We also identified 14 lncRNAs (CTD-3131K8.2, CTD-2587H24.5, AC124789.1, TUG1, RAB30-AS1, RP13-463N16.6, LINC00886, LINC00886, ZNF37BP, FTX, EMX2OS, FGD5-AS1, RP11- 417E7.1, AC016747.3), which can form RNA: DNA triplexes with promoters of regulated genes, some of which are also involved in epigenetic regulation. In summary, we identified 17 lncRNAs potentially involved in the regulation of at least one epigenetic modification. Among these lncRNAs we found as those described in the literature as other lncRNAs, which role has not been shown earlier. Each lncRNA was knocked down again with at least two antisense oligonucleotides (ASOs) for targeted validation. In addition to the previously planned work, we have shown the role of lncRNA ecCEBPA in the regulation of DNA methylation in acute myeloid leukemia. We have proposed a model for the regulation of DNA methylation through the formation of triple helices by ecCEBPA with the promoters of regulated genes. It has also been shown that NR2F1-AS1 can function as a sponge for microRNAs targeting DEGs that respond to knockout of this lncRNA. We have also shown that chimeric transcripts, which play an important role in carcinogenesis, form a long hairpin, possibly playing a similar role as small interfering RNAs (siRNAs) and down regulating gene expression. Our study emphasizes the functional importance of lncRNAs regardless of their level of expression, localization, and conservation, including their specific role in the epigenetic regulation of gene expression. Molecular phenotyping combined with the integration of multi-OMIX data allows high-quality functional annotation of lncRNAs, including detecting their role in the onset and progression of socially significant diseases, in particular in cancerogenesis. Our results reveal new targets for drug therapy of these diseases.

 

Publications

1. Alessandro Bonetti, Federico Agostini, Ana Maria Suzuki, Kosuke Hashimoto, ... Yulia A. Medvedeva., ... RADICL-seq identifies general and cell type–specific principles of genome-wide RNA-chromatin interactions Nature Communications, 11, Article number: 1018 (year - 2020)

2. Ivan Antonov and Yulia Medvedeva Direct Interactions with Nascent Transcripts Is Potentially a Common Targeting Mechanism of Long Non-Coding RNAs MDPI GENES, Genes 2020, 11(12), 1483 (year - 2020) https://doi.org/10.3390/genes00010005

3. Jordan A Ramilowski, Chi Wai Yip, Saumya Agrawal, Jen-Chien Chang, Yari Ciani, ... Ivan Antonov, ... Yulia A. Medvedeva, etc. Functional annotation of human long noncoding RNAs via molecular phenotyping Genome Research, 30: 1060-1072 (year - 2020) https://doi.org/10.1101/gr.254219.119

4. Elena Matveishina, Ivan Antonov, Yulia Medvedeva Practical guidance in genome-wide RNA:DNA triple helix prediction F1000research, International Society for Computational Biology Community Journal (year - 2020) https://doi.org/10.7490/f1000research.1118137.1)

5. - Новый метод покажет, как РНК влияет на активность генома Naked Science (naked-science.ru), 16.03.2020 (year - )

6. - Новый метод покажет, как РНК влияет на активность генома Kolibri.press, 16.03.2020 (year - )

7. - Новый метод покажет, как РНК влияент на активность генома Seldon.News, 16.03.2020 (year - )

8. - Как РНК взаимодействует с ДНК За науку, 16.03.2020 (year - )

9. - Медицина будущего: исследования РНК позволят руководить активностью конкретных генов Поиск, 16.03.2020 (year - )

10. - Показано, как РНК влияет на активность генома Индикатор, 16.03.2020 (year - )

11. - Новый метод покажет, как РНК влияент на активность генома Газета.ру, 17.03.2020 (year - )

12. - ПОКАЗАНО, КАК РНК ВЛИЯЕТ НА АКТИВНОСТЬ ГЕНОМА Биотех 2020, 19.03.2020 (year - )

13. - У регуляторных РНК ищут контакты с ДНК Наука и жизнь, 27.03.2020 (year - )


Annotation of the results obtained in 2018
In 2018, we developed statistical methods for the analysis of the role of long non-coding RNAs in the establishing and maintenance of the cell chromatin profile. We also implemented these methods into analysis pipelines. First of all, we developed a probabilistic model of the interaction of RNA and DNA with the formation of triplexes according to the Hoogsteen rules. As a basic algorithm for predicting the interaction of RNA and DNA with the formation of triplexes, we used Triplexator. In our current work, this algorithm has been substantially improved. A probabilistic model of the distribution of numerical values (scores), determined by the original algorithm, has been developed. This model allows one to take into account the lengths of interacting RNA and DNA regions, their nucleotide composition, and the specifics triplexes determines by Triplexator. The resulting model allows for genome- and transcriptome-wide comparisons, which in turn allows the selection of candidate RNAs, most likely forming a real triplex. We have previously shown that, although Triplexator outperform other software on the experimental data, it still produces a lot of false-positive predictions. In order to compensate for this, in addition to the work planned for this year, we performed tests on how the in corporation of the RNA secondary structure might affect triplex prediction. In more details, RNA regions that are not paired potentially are more likely to bind DNA. Predictions were validated by experimental data on genome-wide binding of lncRNA MEG3. We have shown that the use of only unpaired RNA segments obtained by RNAplfold (Vienna) does indeed increase the specificity of triplex prediction, without reducing sensitivity. Further, we developed a beta-version of the software for lncRNA search that could potentially affect the chromatin state by forming triplexes with the target regions. The software pipeline consists of the following main blocks: (1) a standardized search for histone modification sites in dozens of samples; (2) normalization of lncRNA expression data in the same samples; (3) correlation of the numerical value of the histone modification peak and expression of lncRNA; (4) prediction of the triplex formation between candidate DNA-RNA pairs and evaluation of its statistical significance. The beta-version of software was tested on the H3K27me3 histone mark, since it is known to be established partially by lncRNA MEG3, which also can binds DNA forming a triplex. In addition, we analyzed experimental data on the interaction of RNA with chromatin obtained in a co-laboration, as well as public data from GRID-seq. We showed that although the absolute number of interchromosomal contacts (RNA is bound to the DNA of an another chromosome) is small, the relative frequencies of the triplexes in such contacts are quite high. We also found several RNAs that form triplexes with a large number of regions of the genome, which has not been reported for these RNAs before. Also, we performed a pilot analysis of the data obtained by FANTOM6 consortium. This data reflects changes of expression in response to various RNA knockdown. We analyzed the promoters of genes that significantly differentially expressed in response to knockdown, and identified a number of RNAs, forming triplexes with the promoters that are differentially expressed in knockdown. These data indicate a possible mechanism for the regulation of gene expression using this RNA, yet, further experimental validation in needed.

 

Publications


Annotation of the results obtained in 2019
In 2019, we finalized the development of statistical methods for analyzing the role of long non-coding RNAs in the formation of chromatin profile, as well as the software pipeline incorporating these methods. The pipeline was improved and refined: sampling was refined, most of the calculations were vectorized to increase the processing speed, methods for statistical processing of knockdown data were incorporated, RNA and DNA colocalization data was included, and DNA methylation data was also processed. Using this pipeline, 10 epigenetic modifications and variable DNA methylation regions were analyzed. The results of the correlation analysis between lncRNA expression and the presence of histone modification at a specific locus were processed more deeply: correlated histone ChIP-seq peaks were associated with their regulated genes. In turn, the obtained lists of genes correlated with specific lncRNAs were split by the sign of correlation. We suggest that the correlation sign may indicate the functional role of the ncRNA-modification-gene correlation: a positive correlation means that lncRNA can be involved in establishing of a histone modification; a negative correlation means that ncRNA can be involved in its removal. Next, we incorporated the functional analysis using lncRNA knockdown data obtained by FANTOM6 consortium. For each knockdown lncRNA, the lists of differentially expressed genes were split into those that increase and decrease their expression. For the lncRNAs present in FANTOM6 knockdowns, the significance of overlapping the list of genes having modification peaks, correlating with lncRNAs of a certain sign with the list of genes from knockdown experiments that change the expression in a certain direction was predicted. Thus, 4 groups of ncRNAs were distinguished according to their functional role: lncRNAs that put repressive marks, lncRNAs that remove repressive labels, lncRNAs that put activator marks, and ncRNAs that remove activator marks. An analysis of the colocalization of lncRNA with correlated ChIP-seq histone peaks was also performed using iMARGi data. To evaluate the significance of colocalization of genomic markups, the GenometricCorr package was used. As a result, for each modification, a collection of lncRNAs were obtained that were potentially involved in targeting histone modifiers to specific genomic loci and were classified into one of 4 groups according to the possible type of influence on the histone modification. A list of lncRNAs that potentially affect methylation at specific locations in the genome has also been obtained. In total, more than 50 lncRNAs were identified to be potentially involved in changes in histone modifications and DNA methylation. The obtained lncRNAs were tested for the possibility of the formation of lncRNA-RNA duplexes and lncRNA-DNA triplexes using the ASSA, Triplexator, and TDF programs. Among the lncRNAs found by our pipeline, there were those previously reported in the literature as well as lncRNAs, functional role of which has not been shown earlier. The functional role of one lncRNA was experimentally validated.

 

Publications

1. Elena Matveishina, Ivan Antonov, and Yulia A. Medvedeva Practical Guidance in Genome-Wide RNA:DNA Triple Helix Prediction International Journal of Molecular Sciences, 2020, 21(3), 830 (year - 2020) https://doi.org/10.3390/ijms21030830

2. Matveishina E., Medvedeva YA. Predicting DNA:RNA triplexes based on RNA secondary structure Proceedings of 9th Moscow Conference on Computational Molecular Biology MCCMB'19; - М.: ИППИ РАН, 2019, 224 (year - 2019)

3. Mazurov E., Antonov I., Matveishina E., Medvedeva YA. Long non-coding RNA in chromatin formation Proceedings of the Workshop “Epigenetics of infectious and non-communicable diseases” 16 – 19 September 2019, Cape Town, South Africa, p.37 (year - 2019)

4. Medvedeva YA ФОРМИРОВАНИЕ ТРИПЛЕКСОВ ДЛИННЫМИ НЕКОДИРУЮЩИМИ РНК В МАСШТАБЕ ПОЛНОГО ГЕНОМА VII СЪЕЗД ВАВИЛОВСКОГО ОБЩЕСТВА ГЕНЕТИКОВ И СЕЛЕКЦИОНЕРОВ, ПОСВЯЩЕННЫЙ 100-ЛЕТИЮ КАФЕДРЫ ГЕНЕТИКИ СПБГУ, И АССОЦИИРОВАННЫЕ СИМПОЗИУМЫ. Сборник тезисов Международного Конгресса. Издательство: ООО "Издательство ВВМ" (Санкт-Петербург), 124 (year - 2019)