Corresponding author: Igor A. Chusov ( igrch@mail.ru ) Academic editor: Yury Korovin
© 2019 Igor A. Chusov, Pavel L. Kirillov, Vladimir G. Pronyaev , Nikolay A. Obysov, Grigoriy E. Novikov.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Chusov IA, Kirillov PL, Pronyaev VG, Obysov NA, Novikov GE (2019) Ontologies and databases on thermophysical properties of nuclear reactor materials. Nuclear Energy and Technology 5(2): 145-153. https://doi.org/10.3897/nucet.5.36476
|
The study is dedicated to the information technologies for storage, systematization and distribution of thermophysical data for nuclear power engineering. The general trend existing in the areas involving wide use of scientific data is the shifting from conventional databases to the development of a consolidated infrastructure capable of overcoming sharply growing volumes of scientific data with continuously increasing complexity of the data structure due to the expansion of the range of materials. The above infrastructure ensures interoperability, including data exchange and dissemination. The general principle of data management for thermophysical properties of the nuclear reactor materials based on the subject-oriented ReactorThermoOntology (RTO) is suggested in the present paper. The ontology includes a unified glossary of all concepts, expanded through logical connections and axioms. The suggested RTO ontology combines the terms typical for reactor materials, their characteristics, as well as all types of information entities determining textual, mathematical and computer structures. In the coded form, the ontology becomes the control add-in capable to integrate heterogeneous data. Its most important feature is the possibility of its permanent expansion, which is necessary with introduction of new materials and terms related to them, e.g. nanostructures characteristics. Beside the ontology, description of the reactor materials, the possible scenarios for the use of the ontology during the phases of design, operation and integration of autonomous resources, primarily databases, are examined in the paper. The use of Big Data technology with diverse variations of logical structures of the data is suggested as the most efficient tool for data integration. Joint use of the technologies which before were applied separately, such as exchange standard in the form of the structured text documents, data control based on the ontology and platform for the work with big data, allows the conversion of multiple existing primary resources (databases, files, archives, etc.) to the standard JSON text format for the subsequent semantic integration.
Thermophysical properties, reactor materials, nuclear fuel, ontology, database, data integration, JSON-format
Development of nuclear power generation is inextricably associated with vast infrastructure ensuring data storage and preservation of knowledge of technologies, work processes, materials and other aspects. The core of this infrastructure is composed by databases (DB) and, what is more, increase of the volume of data related to the expanding scope of objects necessitates the involvement in the operational practice of multiple non-homogeneous resources with markedly different formats, semantics, etc. This complicates joint use of the data in data processing and in calculations. As the result, the problem emerges of integration of heterogenous resources created under significant differences in the model and data representation (
The key idea of the future infrastructure is the formation of space combining the data with different structure originally localized within autonomous DB (
Wide capabilities of ontologies motivated activities on their development for the formalization of multiple disciplines – chemistry, material studies, Earth sciences, etc. (
The present paper is dedicated to the creation of ontology for thermal physics of reactor materials. Thermal physics properties of materials taking into account their significance in the modeling impose stricter requirements on the resources ensuring their collection and systematization. Simplified ontology singling out the basic concepts (substance, property, data) and accompanying terms (dimension, state, uncertainty, source) (
Implementation of the planned project specifying in details the hierarchy of classes and associative correlations determining logical connections between concepts is suggested. The following section is dedicated to the description of the ontology with possibility of permanent supplementing new concepts, for instance, in the listing of materials and in the nomenclature of material properties. Supplementary information in the form of diagrams illustrating taxonomy of objects and their characteristics, as well as OWL-file coding the contents of the ontology can be found on the website (
The third section is dedicated to the issue which is most important from the viewpoint of practical use – how to ensure integration of several DB on thermal physical properties with different formats, data models and their semantics based on the existing ontology. Virtual connection with common interface and unified interrogation system is understood as the integration. Vast experience of application of ontologies for integration of DB in material studies (
Designing ontologies (
As it is evident, the original Entity concept\class has two successors: Continuant for entities preserving their identity with time, and Occurent for entities having temporary parts and developing with time. The Continuant class has the following as the successors: Independent Continuant and Dependent Continuant for objects of the data domain and their attributes. The Dependent Continuant class also has two daughter classes: Specifically Dependent Continuant (combining characteristics of the objects) and Generically Dependent Continuant (summating data types and corresponding concepts) with one Information_content_entity daughter class. The Specifically dependent continuant class has two daughter classes: Quality (the whole population of properties of objects, processes and documents), as well as Realizable entity for defining objects in terms of their possible roles or capabilities. Rigorous and universal structure adaptable to any data domain is thus formed. The following levels of the hierarchy including fundamental concepts of reactor thermal physics are also shown in the figure. All types of substances and materials are represented by the Material_entity class; Quality class has the following daughter classes: Material_Quality, Sample, State, Information_content_quality, Process_quality. The first of these classes sums up all types of physical characteristics such as density, heat conductivity, etc. The Sample class determines characteristics of specific sample (manufacturer’s brand, form and structure of the sample, etc.), the State class singles out the aggregation state (gas, liquid, etc.). The latter two classes characterize information entities (document, identifier) and processes.
Specifics of reactor thermal physics is determined first of all by the hierarchy of materials taking over the Material_entity class (Fig.
The most typical classes are shown in Figure
Investigation of materials applied in nuclear engineering often cover the objects not falling under the five categories specified above. Systems of U-O type including UO, UO2, UO3, U3O7, U3O8, U4O9 oxides in different phase states, solutions and compounds of Np, Am, Cm (the so-called minor actinides), fission products in mixtures with other materials, as well as numerous compounds and particles (atoms, molecules, isotopes) belong to such. All of them refer to the hierarchy initiated by the Chemical_entity class furthermore reflecting the possible role or assigning in the upper-level Role class (see Fig.
The taxonomy represented in Fig.
Along with Material_entity class the degree of completeness of data representation is ensured by two upper-level classes: Quality and Information_content_entity (Fig.
Special value is imparted to the data by the Information_content_quality hierarchy. It includes the Uncertainty, Status_descriptor (subdividing data types – experimental, calculated, reference, recommended) and Quality_descriptor classes, i.e. the overall assessment of the data quality taking into account their uncertainty, completeness of description of the specimen and the method, degree of repeatability, etc.
The potential realized here (
Two ontologies Semanticscience Integrated Ontology (SIO) and Chemical Information ontology (ChemInf) from the Ontobee (www.ontobee.org) repository were used along with BFO ontology for importing concepts. SIO ontology represents wide selection of types and correlations for description of scientific activities. ChemInf ontology covers typical concepts in chemistry and material science, such as molecule, solution, substance, etc. Terms from external ontologies cover practically completely the set of concepts which are daughters as related to such classes as Chemical_entity, Information_content_entity, Information_content_quality and, partially, Material_Quality. All new concepts and classes corresponding to them are intended, mainly, for entities typical specifically for reactor thermal physics and, first of all, those, which are included in the Material_by_application hierarchy (see Fig.
In full volume efficiency of importation from external ontologies is reflected in the selection of the so-called object properties or associative correlations. The above described hierarchy of classes per se is just a glossary in the form of taxonomy of concepts for objects, their characteristics, processes, documents, etc. Object properties establish logical connections between the object and its characteristic, the object and part thereof, document and the object of its description, etc. Already mentioned Semanticscience Integrated Ontology (www.ontobee.org/ontology/SIO) is accepted here as a suitable source of object properties. The ontology includes a set of 207 object properties capable to adequately represent almost all imaginable connections between objects and their attributes. Limited selection from this set containing the most characteristic connections is presented in Table
Each of the properties is defined by the ID included in the unique web address (URL). Thus, index SIO_000028 defines the URL http://semanticscience.org/resource/SIO_000028.rdf, where detailed description of the has_part property is provided (Fig.
Limited selection of typical object properties (associative correlations), imported from the SIO ontology
Item no. | Object Properties | ID |
1 | Is_related_to | SIO_000001 |
2 | Denotes | SIO_000020 |
3 | Has_part | SIO_000028 |
4 | Is_denoted_by | SIO_000060 |
5 | Is_part_of | SIO_000068 |
6 | Is_contained_in | SIO_000128 |
7 | Contains | SIO_000202 |
8 | Is_connected_to | SIO_000203 |
9 | Has_quality | SIO_000217 |
10 | Is_quality_of | SIO_000218 |
11 | Is_source_of | SIO_000219 |
12 | Has_role | SIO_000228 |
13 | Is_component_part_of | SIO_000313 |
14 | Is_covalently_connected_to | SIO_000334 |
15 | Is_weakly_interacting_with | SIO_000335 |
16 | Has_component_part | SIO_000362 |
17 | Is_described_by | SIO_000557 |
18 | Describes | SIO_000563 |
19 | Has_identifier | SIO_000671 |
20 | Has_data_item | SIO_001277 |
21 | Is_data_item_in | SIO_001278 |
As the result the RTO ontology defines coordinated glossary of terms and specification of their meaning for reactor thermal physics, which allows conducting search and formulating logical conclusions. It is significant that it is possible to continuously expand classes associated with appearance of new materials or new factors, for instance, fuel nanostructure. Expandable listing of classes which are daughters with respect to Sample allows including originally not introduced specific features of the specimen such as configuration, impurities, porosity, etc. And, finally, enormous set of text and mathematical terms ensures the required adjustment of the data type, for instance, replacement of a single number with a data set or interval of values.
Mutual complementarity of ontologies and DB. New direction emerged presupposing synthesis of DB with ontologies for the purpose of utilization of advantages of both instruments. Clear correlation between elements of DB and ontology forms its basis. Thus, certain classes correspond in the ontology to the entities of the DB, properties of classes correspond to the attributes, and axioms correspond to the limitations. Their main differences amount to the fact that the ontology describes the structure of the data domain using formal language, while the DB conceptual layout describes certain DB without pretending to disseminate knowledge. In other words, DB is focused specifically on the data (numerical, text, etc.), while the ontology is oriented towards the interpretation of their meaning and realization of automated inferences. Loss of semantics is the major limitation of DB because the meaning of the entities is not accessible to persons not familiar with configuration of the DB.
Another critical drawback of DB is the impossibility to maintain evolution of the data schema. Practice of data systematization according to their properties demonstrates unsuitability of rigid structure for wide combination of substances taking into account the specifics of specimens (
At the same time DB ensure the highest productivity in the realization of complex queries not achievable for other types of architecture. This is exactly why implementation of systems combining accessibility of semantics with high productivity of operations with data is justified. Database-to-ontology mapping when possibilities of DB are augmented by the connection with ontology for queries using common semantics or integration of heterogenous DB is realized in the DB design or integration tasks. Ontology ensures semantics, i.e. the glossary, correlations between concepts and the data structure, but, nevertheless, it does not contain specimens with the role of the latter played by DB records. Combination of data schemas and the data per se is achieved in the process of integration, and the user formulates in terms of ontology the queries which are converted into queries to the DB. Thereby, the ontology plays the role of efficient intermediary between the user and the data. It was shown in (
Examples of efficient use of ontologies by the properties of substances. Chemical DB ChEBI (www.ebi.ac.uk/chebi/) is the most popular among DB by the properties of substances. Unique identifier ChEBI ID (for example, CHEBI:15377 for H2O) accessible for citation by the web user or by the software agent (it is sufficient to enter ChEBI ID of the required entry in Google to access the entry) is assigned to each of the molecular entities (as well as their groups and classes) included in the DB. Set of data includes elementary information about the molecule (chemical formula, mass, charge), structural data, hyperlinks to other DB, in particular, NIST DB (webbook.nist.gov/chemistry/) containing wide scope thermal dynamics information.
Each entry contains as well the fragment of ontology with possibility of navigation through parent and daughter classes. Associative correlations of the type of those represented in Table
Mapping of external query generated in terms of ontology as a set of sub-queries to isolated data sources without their aggregation in a single repository is used for integration of several DB. The central component of such virtual system is the intermediary with unified access interface and unified model based on the ontology or the set of ontologies. Processor included in the intermediary performs the division of the query into sub-queries to data sources. The so-called adaptors are anchored to data sources for resolving problems emerging because of heterogeneity of models. Thus, the use of “intermediary-adapter” architecture of the global model in the form of ontology and data federation, i.e. their fusion in the course of execution of query to the system from autonomous DB, constitutes the basis of integration. The details of the technology for different approaches to integration are discussed in (
The potential of integration based on Matinfo ontology is examined in (
Ontology resolves the problem of data exchange between three DB in Japan: AIST (Advanced Industrial Science and Technology), NIMS (National Institute of Material Science) and widely used structure MatDB (https://odin.jrc.ec.europa.eu) containing data on material testing. The main bulk of information in the above listed DB refers to thermal and mechanical properties. Data exchange is realized using the intermediate level occupied by Matinfo ontology.
Structure of NIMS data is adjusted to storage of measurement data with main metadata, thermalConductivity and chemicalFormula stored in relative fields. In contrast, metadata in AIST DB are specified by the user with a character string entered in the field named property which allows expanding or modifying the names of properties under fixed schema. Additional difficulty is introduced by the fact that AIST DB is capable to store both scalar and tensor values of heat conductivity typical for monocrystals or materials with pronounced anisotropy of the structure. The same possibility is fixed in the ontology as well where both scalar value and arbitrary matrix can be reflected.
The system of data exchange between the above mentioned MatDB in the composition of Online Data & Information Network for Energy and the reference book Gen IV Materials Handbook (https://gen4www.ornl.gov/) prepared in Oak-Ridge National Laboratory is an example of such integration in nuclear engineering. Integration was implemented without physical aggregation of resources by conversion of two different formats into a unified data format (
Big Data technology of resource integration tasks. Application of approaches intended for work with big data opens new emerging possibilities in the integration of scientific resources. As applied to materials properties the source of big data is the flow of publications with data volume determined, moreover, not so much by the number of investigated objects than by the vast variety of conditions of synthesis and measurements, microstructural specifics, etc. It is specifically the latter among the above three attributes for assigning to the category of “big data” (3V – Volume, Velocity, Variety) which plays the decisive role as applied to the data on the properties. It is shown in (
JSON-format proved to be one of the most convenient formats for data exchange due to the simplicity of reading and editing, friendly in terms of human comprehension and storage of hierarchical structures. JSON-format is the working unit for a number of platforms, for Apache Spark in particular, allowing organizing the exchange, storage and organization of queries for dispersed data.
Scenario developed in (
Apache Spark platform allows overcoming the expanding volume and dispersed nature of the data on the properties due to high productivity and manifested orientation towards data handling including data storage, processing, analysis in the distributed environment. Among other technological features it is distinguished by the presence of inbuilt libraries for analytical processing, including for organization of SQL-queries by means of which access can be gained to the contents of structured JSON-documents. It is specifically the possibility of making SQL-queries to massive data arrays that plays the key role in the problem of their integration. Another feature determining the efficiency of Spark for storing and processing the data is the capability to support interrogation with numerous types of storage - from HFDS (Hadoop Distributed Files System) to conventional DB on local computers.
Advantages of Apache Spark amount to high speed of calculations, handling data of different origin (text, semi-structured and structured data) from different sources (files in different formats, DBMS and thread-specific data). In combination the proposed means open practically unlimited possibilities in terms of productivity and diversification of handling complex data which include the data on properties of complex compositions with which reactor thermal physics deals.
Expediency of implementation of new approach to handling data on properties of materials based on ontologies as the means ensuring integration of heterogenous data, is substantiated. Rich capabilities of ontologies in the areas associated with use of vast data arrays are supported by their wide implementation for the purposes of systematization, search and building logical architecture by computer. Option of ontology is suggested for thermal physics data characterized with strict requirements on the representation of uncertainty, details of experiment and specimen, etc. It is demonstrated that editing the ontologies allows overcoming two critical problems of data handling, namely, evolution of operated DB associated with expansion of nomenclature of objects and concepts and accounting for additional factors influencing properties of material (manufacturing technology, dimensions and structure of the specimen, influence of ambient conditions, operational conditions, etc.). Examples are examined of application of ontologies in chemistry and in material studies as the means for integration of autonomous resources with different structure and data format. It is suggested to use certain technologies of the so-called “big data” originally adapted to handling multiple heterogenous sources as the most efficient means.