INFORMATION ABOUT PROJECT,
SUPPORTED BY RUSSIAN SCIENCE FOUNDATION

The information is prepared on the basis of data from the information-analytical system RSF, informative part is represented in the author's edition. All rights belong to the authors, the use or reprinting of materials is permitted only with the prior consent of the authors.

 

COMMON PART


Project Number17-43-00003

Project titleChemoinformatics approaches to organic and metabolic reactions: from empirical to predictive chemistry

Project LeadVarnek Alexandre

AffiliationKazan (Volga region) Federal University, Kazan University, KFU,

Implementation period 2017 - 2018 

Research area 03 - CHEMISTRY AND MATERIAL SCIENCES, 03-705 - Chemical informatics

KeywordsChemoinformatics, molecular modeling, expert systems, organic and metabolic reactions, chemical databases, OSAR / QSPR


 

PROJECT CONTENT


Annotation
This project aims to develop new chemoinformatics approaches to quantitative assessment of kinetic and thermodynamic parameters of chemical reactions, as well as to the determination of optimal reaction conditions. Another objective of the project is the development of algorithms for the automated standardization and correction of incomplete reactions data massively presented in modern databases, as well as analysis and visualization of large amount (millions) of reaction data. Current approaches in chemoinformatics are mostly oriented to processing information on individual chemical compounds encoded by molecular descriptors. In this context, chemical reaction is a complex object for the modeling, since its description includes both specification of two types of species (reactants and products) and reaction conditions (catalyst, solvent and various additives). Thus, it is not clear for which molecular species descriptors should be generated and how reaction conditions could be described. The key element of the developed methodology is the Condensed Graph of Reaction (CGR) approach, in which chemical reactions are considered as pseudomolecules. With this considerable simplification, it is possible to handle chemical reactions using current powerful chemoinformatics tools designed originally to work with individual molecules. Thus, special fragment descriptors computed for CGRs can be used both for building Quantitative Structure-Reactivity Relationships (QSRR) models as well as for similarity searching in reaction databases. The fact that any chemical reaction can be represented by a set of fragment descriptors opens a way to build a multi-dimensional space of chemical reactions, which can be analyzed with dimensionality reduction methods. These advantages of CGRs were used in the Project 2014, where significant work was undertaken to create a unique database of thermodynamic and kinetic parameters of the reactions, predictive models linking reagents and products structure with reaction rate or equilibrium constants were built, and an expert system capable of predicting the optimal reaction conditions for deprotection (removal of protecting groups) was created. This work will be continued in the Project 2017 using advanced information technologies. In the new project we will significantly expand the methodology for analysis and modeling of reactions, as well as the fields of its application. Specifically, the method of Generative Topographic Mapping (GTM) will be used for visualization and analysis of large data sets, whereas two types of neural networks - Hopfield Nets and Restricted Boltzmann Machines - will be used to classify reactions in databases. Special attention will be paid to the reactions actively used in medicinal chemistry for the synthesis of new drug molecules. For each of them, an ensemble of special hash codes – “Reaction Signatures” – uniquely identifying given chemical transformation, will be developed. They will be used for statistical analysis of 1.1 million of the reactions extracted from patents. The expert system for assessing the optimal experimental conditions for catalytic hydrogenation reactions involving the protecting groups will be significantly modified by the use of advanced machine learning methods, the Hopfield Nets and the Restricted Boltzmann Machines, and this would lead to enhanced predictive performance. As part of the Project 2017, we will continue to collect the reaction data from the literature, master's and doctoral theses for the database with information on the kinetic and thermodynamic parameters of reactions. The number of records in the database will be doubled and will reach 40000 records. The database and the expert system including the predictive models for thermodynamic and kinetic parameters of different classes of reactions and for their optimal conditions (catalyst, solvent, temperature, etc.) will be available to users on a special Web server. We hope that the theoretical approaches and software tools the project will significantly reduce the human and material costs and facilitate the selection of the best reagents and optimal reaction conditions. The international research group involves leading specialists in chemoinformatics from France, Germany, the USA, and the Czech Republic. The Russian team includes scientists from the Kazan Federal University and Moscow State University. All members of the research team will be involved in teaching in the framework of a new master's program for chemoinformatics opened at the Kazan Federal University in 2012. These Master students will also take part in the studies conducted in the framework of our project. These activities will inspire preparation of the first Russian textbook on cheminformatics. First four volumes of this textbook has been published by some projects’ participants in the Kazan Federal University. Finally, the team members from the Kazan Federal University and the Strasbourg University have co-organized several international conferences and summer schools on chemoinformatics in Russia and in France. This activity will be continued in the framework of this project.

Expected results
In the beginning of the XXI century, the development of synthetic chemistry has led to the accumulation of large amounts of data, the number of which grows exponentially with time. So, thanks to modern technologies of synthesis implemented in parallel, combinatorial, microwave and in-line reactors, the amount of data on the chemical reactions in the major databases CAS and Reaxys has doubled over the last 3 years and exceeded 100 million reactions. The explosive accumulation of experimental information creates favorable prospects for chemoinformatics implementing artificial intelligence methods, the purpose of which is to extract from the "raw" experimental data both qualitative and quantitative relationships, prompting chemists synthesis strategy and the choice of optimal experimental conditions. However, the application of artificial intelligence techniques to reactions is very limited, because of very low quality the "raw" data recorded in the databases, and the complexity of chemical reaction as object for model creation. This explains the unsatisfactory development of the methods for analysis and modeling of reaction data, as well as a lack of software tools to assess important parameters for new reactions. Consequently, despite the phenomenal development of information technologies, synthetic organic chemistry is still based mainly on the trial and error methods, resulting in huge losses of material. The aim of our project is to develop new approaches for reaction data mining, as well as to create software products that would help synthetic chemists in their routine work. All these approaches are based on the original concept of the Condensed Graphs of Reaction (CGR), in the frame of which any chemical reaction involving reactants and products is regarded as a single pseudo-molecule. Due to this considerable simplification, it is possible to handle chemical reactions using the effective methods of data analysis, visualization and "structure-property" modeling developed originally in chemoinformatics for individual molecules. A part of these tasks have been carried out in the Project 2014. In Project 2017, we have focused on the methods of analyzing, processing and extracting useful knowledge for the chemist from the large volumes of "raw" reaction data. We expect to significantly expand the applicability domains of the approaches and tools developed in the Project 2014 due to the use of advanced information technologies developed recently. We will continue to work to create predictive models for the kinetic and thermodynamic parameters of new types of reactions, as well as for assessing their optimal conditions. Completely new direction is the use of the CGR concept for predicting specific transformations of molecules leading either to significant (activity cliffs) or insignificant (bioisosteres) changes in their biological activity. Particular attention will be paid to the availability and the ease of use of the software developed in the project. Note that despite the obvious relevance of the topic, the publications on the application of artificial intelligence techniques to the problems of chemical reactions are very rare. We hope that the results of the execution of the Project 2017 will be in demand by chemical community. In the framework of the Project 2017, we have set ourselves the following objectives: 1. Development of the methodology for the standardization, completing and analysis of "raw" data on chemical reactions. We plan to create a software product, which will standardize chemical reactions, identify missing reactants and products in reaction equations, create the most correct Atom-to-Atom Mapping (AAM), identify of the type of reaction, and search for duplicates. The following approaches will be developed for this purpose: (a) it will be developed a fundamentally new method of creating AAM based on machine learning, which has no analogues in the literature. Unlike other approaches, this method is able to learn a knowledge to create correct AAM, (b) it will be developed a technology for auto-completing missing information on reactants and products in reaction equations, (c) it will be developed a set of rules for proper standardization of reaction equations. Currently existing cheminformatics tools are able to work correctly with molecules, but often make mistakes when processing chemical reactions, (d) it will be developed a common protocol for processing information on reactions, which will include the tools developed in the Projects 2014 and 2017. This protocol will be implemented as a software tool that would take “raw” reactions as input and give standardized reactions with quality labels as output. We believe that this methodology and related software products will allow transforming the 'raw' reaction data into information that can be easy to analyze using artificial intelligence techniques. We hope that this new information technology will be in demand when working with current databases with chemical reactions. The cooperation agreement signed in 2016 between the Cheminformatics Laboratory of KFU and the RELX Group (the owner of the database Reaxys) shows the interest of industry in our developments. 2 Analysis and visualization of large amounts of data on chemical reactions. In the Project 2014 a set of tools has been developed for the visualization of reactions in the form of two-dimensional GTM maps. However, it has become clear that for the visualization of large amounts of reaction data in the descriptor space of large dimensionality substantial revision of the underlying algorithm is required. In the frame of the Project 2017, the incremental GTM algorithm will be significantly modified and then applied to the analysis of 1.1 million reactions extracted from patents. We believe that this tool will be in demand when working with current reaction databases, as it will allow mapping all registered reactions onto a two-dimensional map, where the areas populated by reactions of different types will be highlighted with different colors. Such map itself is a tool for analyzing the contents of reaction databases. Comparison of the maps built on the data registered in different years would show the evolution of the content of the database in time. The latter will be demonstrated on the database of reactions extracted from patents. 3. Automated analysis of the content of chemical reaction databases. Typically, all reactions in databases are annotated manually. However, a huge number of new reactions being registered daily in databases require the use of an automated reaction classification procedure. Two alternative approaches to determine the type of reactions will be developed in the project. The first is based on the reaction signatures, i.e. structural motifs in CGR uniquely identifying reaction types. Basics of the methodology for building the reaction signatures have been developed in the Project 2014. If a database has the types of reactions for which the signatures cannot be directly constructed, such types will be identified using an alternative approach - recurrent neural networks, which will be applied to the analysis of "big reaction data” for the first time. For unannotated reactions identified in this way, reaction signatures will subsequently be deduced. In this project, we plan to build several dozen signatures for the most used in medicinal chemistry reactions. These signatures in combination with recurrent neural networks will be used for the analysis of 1.1 million reactions extracted from patents. Thus it is expected to provide users with a methodology for automatic classification of reaction data, along with the patent database fully annotated by reaction types. 4. Modeling of the optimal reaction conditions for the hydrogenation catalytic reactions. In the Project 2014, a prototype expert system has been created to predict the optimal conditions for deprotection reactions. The methodology implemented in the system was based on searching for similar reactions. In the framework of the Project 2017, recurrent neural networks (Hopfield Nets and Restricted Boltzmann Machines) will be used to assess the optimal experimental conditions, and this is expected to essentially improve the quality of predictions. In addition to creating systems for making predictions, we will create a system for the analysis of the reactivity of functional groups in a variety of conditions. This will result in a publication. It is expected to develop electronic tables analogous to Green’s Reaсtivity Charts in the book “Greene’s Protective Groups in Organic Synthesis”. These tables will allow a chemist to choose the optimum catalyst for their transformations of protecting and functional groups under hydrogenation conditions. A special feature of our tables is that they will result from the analysis of hundreds of thousands hydrogenation reactions in the Reaxys database on the basis of clear quantitative criteria for assessing the reactivity, and they will constantly be updated as soon as new data becomes available. At the end of 2016, an agreement with the company RELX Group, Switzerland was signed for the development of this approach. It should be noted that the proposed methodology for estimating the optimal reaction conditions is unique and, to date, has no analogues in the world. 5. Building predictive models for "bioisosteres". We have shown that the CGR approach can be efficiently used to describe the transformation of molecules which either lead to abrupt changes in biological activity (activity cliffs), or do not alter the activity (bioisosteres). In the framework of this project it is planned to develop an approach which would allow predicting whether the biological activity will change by replacing one structural group by another for given biotarget. Such analysis will be performed using the ChEMBL database for 10 different biotargets. Using this approach, a model will be developed, which will further be published on the site. An algorithm for predicting allowable bioisosteric replacements for given molecule will be created. This approach is unique and may be claimed in medicinal chemistry to develop new drug molecules. 6. Creating a database and predictive models on the kinetic and thermodynamic parameters of chemical reactions. Collecting data to this unique database of chemical reaction characteristics, QSRR DB, will be continued in the Project 2017. This database is based on the results of the Project 2014 and contains currently 14,000 kinetic and 15,000 thermodynamic characteristics of chemical reactions. It is planned to double the database through acquisition of new thermodynamic and kinetic parameters for the hydrolysis reaction, bipolar cycloaddition, deprotonation, bimolecular and unimolecular nucleophilic substitution and elimination, and Diels-Alder reactions. For the collected reactions and properties, predictive models will be constructed for the prediction of different characteristics: the rate constants, equilibrium constants, activation barriers and pre-exponential factors in the Arrhenius equation, the selectivity of the reactions. The models for predicting ratios of reactants and products based on reaction mechanisms will also be developed. Both the database containing information about important parameters of popular reactions and appropriate predictive models are a unique information product that will be in demand by chemists to design new reactions. The software created under the Project 2017 will be made available to users through a web interface. Thus, chemists will be able to theoretically evaluate the most important characteristics of reactions, which will significantly reduce the financial and time costs for the development of strategies for the synthesis and optimization of conditions. Note that the realization of the Project 2014 has already led to significant development of the community of scientists and experts in the field of chemoinformatics in Russia. In the frame of the Project 2017, the tradition of the Summer Schools on cheminformatics, which are held every odd year in KFU (to be held 5-7 July 2017), as well as the school-seminar "From the empirical to predictive chemistry" (to be held in May 2018), will be continued. These events will gather young scientists, undergraduate and graduate students from all over Russia, so they will meet with the most famous experts in this field. The project has had a significant impact on the development of university education in the field of chemoinformatics. Almost all participants were involved in the educational process. Four project participants (A. Varnek, I.S. Antipin, I.I. Baskin, and T.I. Madzhidov) have formed a team of authors for the first chemoinformatics textbook in Russia, some chapters of which have already been published in the form of teacher’s aids in KFU. Four manuals (> 1000 pages) have already been published, and this work is scheduled to continue in the framework of cooperation created by the project. After the establishment in 2012 of the first graduate program on cheminformatics in KFU, two more (in MIPT and ITMO) programs were opened with the involvement of the project participants. It is known that similar programs will be opened at 1-2 other universities in Russia, so there is a large potential demand for sharing the experience of teaching in chemoinformatics, textbooks and tutorials. It can be argued that the implementation of this project will enable and strengthen ties in the community of professionals working in the field of chemoinformatics, drug design and materials. Graduate students of the Master’s program in chemoinformatics at KFU will actively be involved in the realization of the Project 2017. Research in the field of cheminformatics has attracted great interest of companies having projects in the field of analysis of information on chemical reactions. Cooperation was started with the database ChemSpider Reactions, UK, and Science Data Software Company, USA, agreement on cooperation was signed with RELX Group, Switzerland, the owner of the Reaxys database. The latter gives us the ability to obtain large sets of reactions through API-functions opened for us with aim to develop of the project to analyze and predict the conditions of deprotection reactions. We plan to continue to attract the attention of companies to the project, in which the tools for processing large reaction databases are developed. It is planned to further support the mobility of young professionals in the field of chemoinformatics participating in this project: sufficiently long-term internships (1-2 months) of the participants of the project at the University of Strasbourg and the Helmholtz Center in Munich for joint research and training are planned. In addition, we plan to actively promote the results of the project by participating in Russian and international conferences. As a result of the project it is planned to prepare 12 articles in international journals and 12 publications in journals and proceedings indexed in RISC.


 

REPORTS


Annotation of the results obtained in 2018
This project is a continuation of the project “Modeling of organic and metabolic reactions by chemoinformatics methods: from empirical to predictive chemistry” started in 2014. At this stage of the project, we set a goal to complete all the studies started in the project, as well as the development of the corresponding software. First of all, we completed the development of the CGR-DB database cartridge, which supports storage of chemical information, data management (adding, removal, merging, separating records) and searching by molecules and reactions. The cartridge can easily be combined with the PostgreSQL DBMS via the object-relational model (ORM) to create a full-fledged database. CGR-DB provides various types of search for molecules and reactions: structure search, substructure search and similarity search. Furthermore, using the structure, substructure and similarity search for molecule, it is possible to search for reactions by molecule (for example, searching reactions with a naphthyl fragment in the product). Besides, a web front-end application was developed, which is available on the laboratory website cimm.kpfu.ru. It should be noted that the regular collection of reaction data is performed with using this tool; there is an appropriate data entry tool for this. The reaction standardization technologies developed in 2017 were improved and merged into a common protocol for standardization and purification of reactions. This protocol is required for the preparation of chemical reaction data sets for modeling, as well as for the reactions and queries standardization for the high efficiency of a chemical reaction database. Standardization tools were included into the CIMMtools library for modeling. Based on the previously developed methodology for constructing atom-to-atom mappings involving the use of machine learning methods, a new methodology for creating atom-to-atom mapping (AAM) using artificial neural networks was developed. It allows building AAMs that are superior in quality not only to the programs based on the use of the naive Bayes classifier (developed at the previous stage of the project), but also comparable or even superior to existing commercial programs. An incremental version of the GTM method optimized for processing large amounts of data was used to analyze and visualize the evolution of synthetic chemistry over time (based on a set of more than 3 million reactions (1976-2016) extracted from patents). The results of the analysis indicate that the modern trend in the development of synthetic chemistry is represented mostly by the development of existing methods of synthesis, rather than the creation of fundamentally new ones. Moreover, enough attention was paid to the development of new methodologies. Using the representation of Matched Molecular Pairs as a Condensed Graph of Reaction, the models that predict the probability of bioisostere group replacement were created for 12 biological targets. An approach was developed that allows for the replacement of functional groups, with the subsequent evaluation of replacements by the developed models. Thus, bioisostere analogues can be generated for any target molecule. The approach was retrospectively validated. A methodology was developed to predict functionally-related properties of chemical compounds by the example of acidity and tautomeric balance. An analytical form of the equation was derived to find the linear regression coefficients predicting the acidity of the tautomeric forms in this way: their difference should be equal to the logarithm of the equilibrium constant. It is shown that using this approach it is possible to build a model that can simultaneously predict the acidity and the tautomeric equilibrium constants with practically no loss of quality. An approach was developed to predict the optimal hydrogenation reaction conditions, based on the ranking of the possible conditions so that the reaction conditions were at the top of the list. Using the best variation of this approach and using the catalytic hydrogenation data prepared in the previous step of the project, a model was prepared to predict the optimal conditions for the catalytic hydrogenation reaction. A prospective and retrospective validation of the model was performed. The model predicts the approximate range of temperatures, pressures, type of catalyst, as well as the type of additive necessary to conduct the target reaction. The model was used to develop software with the command line interface for predicting the optimal conditions for catalytic hydrogenation reactions. It is installed on the laboratory's website and is available at cimm.kpfu.ru. To classify reactions by type, an approach was developed using the Restricted Boltzmann Machines (RBM). It has been shown that this approach may be useful in prediction reaction types; however, using reaction signatures is a more reliable way to classify reactions by type. The collection of data on the kinetic and thermodynamic characteristics of the reactions was continued. In 2018, about 12,000 records were collected from the literature on the kinetic characteristics of SN1-type reactions, nucleophilic substitution reactions in the aromatic ring, hydrolysis, and more than 2500 data on the equilibrium constants of deprotonation reactions. The information about reaction conditions was extracted from the reaction database with data extracted from patents. In total, information on 1 808 240 reactions was collected. Using the collected data, a model for predicting the rate of ester hydrolysis was built. Since new data on reactions were collected, the development of technologies for the preparation and standardization of reactions, as well as taking into account the requirements arising from the publication of models on the server, all models of the kinetic and thermodynamic characteristics of reactions were rebuilt. The models that have shown good quality are available on the server. To predict the selectivity in SN2 / E2 and SN2 / SN1 reactions, new models were developed based on previously built models for reactions SN2, E2 and SN1. On April 5-7, 2018, the traditional Third International School-Seminar "From Empirical to Predictive Chemistry" was held. In total, 96 participants took part in the school, including 13 foreign scientists (including 5 project participants), 29 Russian scientists, and 54 postgraduates and students.

 

Publications

1. M. Glavatskikh, T. Madzhidov, D. Horvath, R. Nugmanov, T. Gimadiev, D. Malakhova, G. Marcou, A. Varnek Predictive Models for Kinetic Parameters of Cycloaddition Reactions Molecular Informatics, V. 37, 1800077 (year - 2018) https://doi.org/10.1002/minf.201800077

2. M. Glavatskikh, T. Madzhidov, I.I. Baskin, D. Horvath, R. Nugmanov, T. Gimadiev, G. Marcou, A. Varnek Visualization and Analysis of Complex Reaction Data: The Case of Tautomeric Equilibria Molecular Informatics, V. 37, Is. 9, 1800056 (year - 2018) https://doi.org/10.1002/minf.201800056

3. T. Gimadiev, T. Madzhidov, I. Tetko, R. Nugmanov, I. Casciuc, O. Klimchuk, A. Bodrov, P. Polishchuk, I. Antipin, A. Varnek Bimolecular Nucleophilic Substitution Reactions: PredictiveModels for Rate Constants and Molecular Reaction PairsAnalysis Molecular Informatics, V. 37, 1800104 (year - 2018) https://doi.org/10.1002/minf.201800104

4. T. I. Madzhidov, A. A. Khakimova, R. I. Nugmanov, C. Muller, G. Marcou, A. Varnek Prediction of Aromatic Hydroxylation Sites for Human CYP1A2 Substrates Using Condensed Graph of Reactions BioNanoScience, V. 8, Is. 1, pp. 384–389 (year - 2018) https://doi.org/10.1007/s12668-017-0499-7

5. T. R. Gimadiev, T. I. Madzhidov, R. I. Nugmanov, I. I. Baskin, I. S. Antipin, A. Varnek Assessment of Tautomer Distribution Using the Condensed Reaction Graph Approach Journal of Computer-Aided Molecular Design, V. 32, Is. 3, pp 401-414 (year - 2018) https://doi.org/10.1007/s10822-018-0101-6

6. Zankov D., Madzhidov T., Sattarov B., Gimadiev T., Nugmanov R., Baskin I., Varnek A. Взаимосвязанные статистические модели для оценки констант равновесия и кислотности таутомеров Бутлеровские сообщения, Т.56. №10. С.26-37 (year - 2018)


Annotation of the results obtained in 2017
This project is the continuation of the project started in 2014 “Modeling of chemical and metabolic reactions by means of chemoinformatics tools: from empirical to predictive chemistry”. The project is aimed to develop new chemoinformatics approaches to the assessment of chemical reaction thermodynamic and kinetic parameters as well as to the determination of the optimal conditions for chemical reactions. Another task of the project is the development of algorithms for an automatic standardization and deficient data correction, constituting the bulk of modern reaction databases, along with the analysis and visualization of large amount (millions and more) of reaction data. A key element of the developed methodology is the Condensed Graph of Reaction (CGR) approach in the framework of which a chemical reaction is considered as a single pseudomolecule. This allows us to apply the arsenal of chemoinformatics tools originally developed for individual molecules to the treatment of chemical reactions. In the framework of the present stage of the project we set the goals of further development of the approaches, overcoming their limitations and application to new types of tasks. So, we propose an entirely new technology for building atom-atom mapping based on machine learning methods. A correct atom-atom mapping is required for a broad range of tasks dealing with chemical reaction analysis. It has been shown that this technology is characterized by high accuracy and computational efficiency. A usable and convenient storage of chemical information requires standardization, i.e a uniform representation of a reaction. We have developed a standardization protocol for chemical reactions, which includes standardization of all molecules, salts and zwitterions as well as filling in the missing data on reagents and products. Our previous algorithm of filling missing reactants has been substantially revised. Besides, a methodology was developed for assessing the quality of the resulting reaction equations. A practical application of this approach to a set of reactions extracted from databases has shown that missing reagents and products can be recovered in more than 95 % of cases. However, for some of the cases additional information should be provided. Most part of work has been devoted to the improvement of our previously developed in-house software. An incremental algorithm for building GTM maps has been largely revised. Software for the statistical analysis of protective group reactivity has been completely rewritten. The latter has acquired an enhanced and improved logic of definition of protective and functional groups as well as their transformations. An interface has been created to access the results in the form of interactive reactivity analysis tables, similar to those of “Green's book Protective Groups in Synthetic Chemistry”. New approaches to reaction classification by types have been created. The first one is based on the GTM dimensionality reduction method and can be used for visualization and analysis of the spaces of chemical reactions, as well as for visual identification of related reaction groups on the resulting maps. Another approach is based on the use of chemical reaction signatures and the identification of reactions that have a common reaction center. It has been shown that in some cases the reactions different from the viewpoint of chemists may have a common reaction center. Finally, the third approach is based on Hopfield networks that can be used to verify the belonging of an object to certain classes. This approach can be used for fuzzy classification by reaction types. The collection of data on the kinetic and thermodynamic characteristics of reactions was continued. In 2017, data on 15000 kinetic characteristics of reactions was collected from the literature for SN1, SN2, cycloaddition and hydrolysis reactions, and about 2,000 for the equilibrium constants and the deprotonation reaction. Furthermore, a data set of almost 40,000 reactions of the hydrogenation of organic compounds based on the data extracted from the Reaxys database was prepared for further modeling. The obtained data has been used for the building of the models predicting the rate constants of the reactions of mono- and bimolecular nucleophilic substitution, cycloaddition as well as for the parameters of the Arrhenius equation for cycloaddition reactions. In addition, a predictive model for the equilibrium constant of the reaction of deprotonation has been built. Some technological improvements of the modeling approaches have been carried out. The models developed within the framework of the project have been published on the laboratory server cimm.kpfu.ru. The traditional Third Summer School-Conference on Chemoinformatics in KFU was held in the period from July 5 to July 7, 2017. A total of 113 participants took part in the school, including 19 foreign scientists (including 4 project participants), 34 Russian scientists (including 22 young scientists), and 57 PhD students and students.

 

Publications

1. Khayrullina A.I., Madzhidov T.I., Nugmanov R.I., Afonina V.A., Baskin I.I., Varnek A. Подход для создания атом-атомного отображения с использованием наивного байесовского классификатора Ученые записки Казанского университета. Серия Естественные науки, - (year - 2017)

2. Nugmanov R.I., Madzhidov T.I., Antipin I.S., Varnek A.A. Автоматическое определение пропущенных реагентов и продуктов в уравнении химических реакций Ученые записки Казанского университета. Серия Естественные науки, - (year - 2017)

3. Polishchuk P., Madzhidov T., Gimadiev T., Bodrov A., Nugmanov R., Varnek A. Structure–reactivity modeling using mixture-based representation of chemical reactions Journal of Computer-Aided Molecular Design, Vol. 31, Is. 9, P. 829-839 (year - 2017) https://doi.org/10.1007/s10822-017-0044-3

4. Zhokhova N.I., Baskin I.I. Energy-Based Neural Networks as a Tool for Harmony-Based Virtual Screening Molecular Informatics, Vol. 36, Is. 11, No article 1700054 (year - 2017) https://doi.org/10.1002/minf.201700054

5. Baskin I.I., Madzhidov T.I., Antipin I.S., Varnek A. Искусственный интеллект в синтетической химии: достижения и перспективы Russian Chemical Reviews, V. 86, Is. 11, P. 1127 - 1156 (year - 2017) https://doi.org/10.1070/RCR4746

6. Polishchuk P. Interpretation of Quantitative Structure−Activity Relationship Models: Past, Present, and Future Journal of Chemical Information and Modeling, Vol. 57, Is. 11, P. 2618-2639 (year - 2017) https://doi.org/10.1021/acs.jcim.7b00274