Infological models of the ENDF-format nuclear data *

Issues involved in the infologic modeling of the ENDF-format nuclear data libraries for the purpose of converting ENDF files into a relational database have been considered. The transfer to a relational format will make it possible to use standard readily available tools for nuclear data processing which simplify the conversion and operation of this data array. Infological models have been described using formulas of the “Entity (List of Attributes)” type. The proposed infological formulas are based on the physical nature of data and theoretical relations. This eliminates the need for a special notation to be introduced to describe the structure and the content of data, which, in turn, facilitates the use of relational formats in codes and solution of nuclear data evaluation problems. The concept of nuclear informatics has been formulated based on relational DBMS technologies as one of the tools for solving the “big data” problem in modern science and technology. The organizational and technological grounds for the transfer of ENDF libraries to a relational format are presented. Requirements to the nuclear data presentation formats supported by relational DBMS are listed. Peculiarities of the infological model construction, conditioned by the hierarchical nature of nuclear data, are identified. The sequence for the ENDF metadata saving is presented, which can be useful for the verification and validation (testing of the structural and syntactical validity and operability) of both source data and the procedures for the conversion to a relational format. Formulas of infological models are presented for the cross sections file, the secondary neutron energy distributions file, and the nuclear reaction product energy-angle distributions file. A complete array of infological models for ENDF libraries and the generation modules of respective relational tables are available on a public website.


Introduction
The paper continues the consideration of the issues involved in conversion of nuclear data files from ENDF libraries to the relational database (RDB) format.The concept and the technology of as well as the programs for and the results of converting the ROSFOND-A and ENDF/B-VII.1 libraries were described in (Yuferov 2011, Yuferov et al. 2013a, Yuferov et al. 2013b).The experience gained shows that certain general principles for the nuclear data description shall be formulated to enable standardization and unification of data transport to RDBs.Previously developed algorithms and conversion programs used the ENDF format as such as the initial model.However, for more efficient use of relational formats in the project codes, as well as for checking the structural and syntactic correctness of data (verification) and establishing their adequacy for the tasks to be solved (validation), it is advisable to build infological models based on the physical nature of the data and the defining relationships.An in-fological model shall mean here a description of data in terms of subject area and their properties and connections aimed at the subsequent relational (tabular) presentation and storage of data.

Concept of nuclear informatics
The amount of nuclear data and the diversity of their structures are so great nowadays that they have required the establishment of a new branch of science known as nuclear informatics.The purpose of nuclear informatics is to create computer models of nuclear data based on their physical meaning and theoretical description, as well as with regard for the need to use data as the analysis object or as the array of constants for the support of design calculations.Both suggest that data is stored, searched for, transferred, visualized and processed using modern information technologies.In particular, this helps to achieve a new quality in solving traditional physical data evaluator problems dealing with classification, verification and validation of data (Varlamov and Ishkhanov 2017, Varlamov et al. 2017, Mitropolskiy 2017).
The problem of large amounts of scientific data emerged quite a long time ago.However, due to historical reasons and the circumstances of organizational nature, this problem is solved in a specific way in each field of science and technology.Thus, the GRIB format was proposed for the storage of meteorological data, the netCDF format for the storage of geoinformation data, the FITS format for the storage of astronomical data, etc. (Metadata for scientific information 2017, Gray et al. 2005, Bartunov andVelikhov 2011).The ideology of these formats is similar to the ENDF format style and suggests that one file stores both arrays of data and respective metainformation in the form of flags, keys, indicators, counters, comments, etc.Such approach necessitates the creation of a special language and special tools for each format (Plyaskin and Kosilov 2002, Sinitsa and Rineiskiy 1993, Sinitsa 2002, Blokhin and Mitenkova 2012).However, the intensification of inter-industry data exchange and commercialization of scientific results require further efforts to unify the technology for handling data of any nature.This is especially important for nuclear power where physical data represents not only the subject of scientific analysis but also the constants basis for design calculations.
Relational database management systems (DBMS) can be used as the key tool of nuclear informatics since they broadly support the known requirements to scientific data presentation formats: -capability to display data of various types and structures; -support of large amounts of data; -data mobility, i.e. the ability to process and store data on different hardware and software platforms; -rapid input/output and visualization; -web-access and web-processing; -expandability of the format.
Two key "consumer" aspects of the relational table technology can be identified which define the efficiency of handling large arrays of nuclear data: -for the physical data evaluator -this is a visualization tool, a complete analog of "paper" tables, which allows, among other things, instantaneous sampling of the cross sections in the table by the table lines and columns, their plotting, online calculation of balances, and parallel analysis of different data versions; -for the programmer physicist -it enables data addressing and sampling "by name" with no control with respect to their actual location in the external or in the short-term memory which provides the relative independence of data organization and processing tasks.

Basis for and purposes of transition to relation format
We shall list the range of organizational and technological foundations for the conversion of ENDF libraries to relational DBMSs.The considerations below are valid for all nuclear physical data.
1. Restrictions are removed for the physical models, types and amounts of data used: -it is possible to place different versions of data in one table which makes data easier to compare and use in comparative calculations; -the structure of data does not impose restrictions on processing algorithms; -data can be stored simultaneously as a text (to ensure and check the identity against primary sources and similar libraries) and as numerical information for processing and transfer to applications.
2. Labor input is minimized for manipulations with data due to no need for programming in solving certain typical problems: -tabular form enables visual monitoring of the data accuracy; -different correlations and dependences can be identified by the evaluator through the online plotting of data with a capability to select intervals and different columns of the table as arguments or functions; -relational tables can be imported directly into other applications, e.g., to statistical packages.

The data input and data access time is minimized:
-the user can go practically instantaneously to the needed table columns or get the data sample or grouping of interest; -a paper table can be scanned and imported into the database for several minutes with the initial structure preserved; -the addition of entries (e.g., due to the need to detail the energy interval) or columns (e.g., the constant errors column) is a standard operation that does not require the file restructuring by way of modifying or adding any control information.
4. The training process is simplified and user qualification requirements are reduced: -a relational DB has an instructional function providing access to tabulated data in meaningful terms; -work with data is carried out in the natural language of the subject area without the use of a specific syntax of the ENDF format, based on the pointers of availability and type of data; -most of the data analysis problems, including computational ones, are solved through "query by example", that is, through declarative indication to the result that shall be found or computed rather than by describing the search or computation algorithm; -data storage in an RDB eliminates the need for explicit location and monitoring of control information (separator entries, navigation parameters, data type indices, counters, flags, etc.).This leads to no need for developing respective data arrangement and formatting guides and does not require time for examining control information and for learning to use it correctly in processing programs.

The application development process is simplified:
-some problems do not require programming and are solved in a dialog mode; -there is no need for programming of procedures for the allotment of memory and for the transfer of data between the external memory and the short-term memory, and of the graphic output procedures; -data is sampled by meaningful names; -the sequence of operations performed in a dialog mode can be written and saved as a macros for re-execution or can be included in the created application; -the addition of new columns to the table does not require, as a rule, modification of the queries modules that sampled data from the tables with the initial structure.

The process of verifying data and validating applications is simplified:
-the data accuracy is checked automatically at the time it is entered in the table; -visual monitoring of data is simplified; -newly developed applications do not require development and debugging of specific data access procedures.
7. Interaction with other applications is simplified based on standard solutions oriented towards the use of relational DBMSs: -remote access to data and computational resources is provided by standard client-server technologies (ADO, ADO.NET, RMI, etc.); -there is no need for developing specific interfaces when creating reactor codes integrated with CAD/ CAE packages; -DATA MINING technologies become accessible for data analysis.
The input data file generation and verification module for a code can be developed in the relational DBMS language faster and more effectively than using traditional programming languages.Accordingly, calculation results saved in tables of a DBMS are suitable immediately for visualization, statistical processing and transfer to other applications.

Display of ENDF data to relationonal format
For the considered subject area (nuclear reactions and decays), the ENDF format (Drake 2017, ENDF-6 2009ENDF-6 , 2011) ) already provides a formalized model in terms of logical pointers and physical parameters for the placement of which 10 fields of the text string are allotted: P1, P2, P3, P4, P5, P6, Material, File, Section, String.(1) The problem, therefore, consists in the following.

ENDF Data must be presented by means of infological formulas
Entity [Attribute_1, Attribute_2,…, Attribute_N], (2) describing the subject area in terms of objects, processes, phenomena (entities) and their properties (attributes).The infological formula defines the structure (row content) of the relational table in which the columns present the attributes of an entity and the lines present its instances differing in the values of the attributes.The set of infological formulas (2) defines in full the database layout, that is, the list of interconnected tables.The table connections are defined by means of keys, that is, attributes of the same names which are present in different tables.

Lines (1) shall be shown in the relational table record
(2).Such conversion is not an elementary operation since ENDF files do not contain start and end pointers for data blocks of a particular semantic content.Besides, the conversion algorithms will be, evidently, defined by the adopted database schema.
The standard line (1) of an ENDF file can be interpreted as a relational table record of 10 columns.This violates the principle of the homogeneity of data in a relational table column but the use of such intermediate tables, into which ENDF files are imported in advance, simplifies the conversion procedure.In particular, the heading lines of an ENDF file's sections and subsections contain homogeneous lists of attributes.Therefore, the tables collected of such string are already in the first normal form (Datе 2004, Gray 1984, Martin 1977, Maier 1983).Saving of metadata in these tables enables verification and validation of both the initial ENDF files and the conversion procedures.
ENDF metadata are excluded in the final version of the relational database.The structure of the table, that is, the list of its fields (columns) is defined only by the nature of particular physical information.The relational table column stores homogeneous data, e.g., the full cross section, and each record (table row) is identified (in this case) by the respective energy value.The requirement of the data homogeneity in separate lines is the prerequisite for the following major entities to be identified in the considered domain: materials, interactions (reactions), decays and distributions.Distribution shall mean here any functional dependences.All other information can be treated as a set of properties (attributes) of these entities.
All individual (scalar) properties of similar objects can be saved in one table.Multiple properties are expressed, as a rule, by distributions, that is, functional dependences (on temperature, energy, mass, orbital moment, etc.).The functional dependence can be arranged in the table either in lines or in columns.Both presentations have conveniences of their own.The arrangement by string enables rapid sampling for the data placement in arrays.For columns, the control system normally provides for the plotting capability in a dialog mode.
With a string arrangement of distributions, the values of one of the arguments are taken as the column names.The line contains all values of the distribution for the particular material, the particular reaction and the current combination of the other arguments.By adding the attribute [Distribution type], one can save all distributions (e.g., energy distributions) in one table.A major restriction here is the insufficient number of columns that can be mapped to existing argument points.Besides, there is a typical situation for evaluated data files when sets of argument points for different distributions do not coincide and various interpolation patterns are used.The string arrangement of this information is possible but it violates the principle of the data homogeneity in the column.
With arrangement by columns, the function and the (scalar) argument are treated as attributes the values of which have two columns allotted for.It is also necessary to have the columns [Interpolation interval] and [Interpolation type].All pairs of the columns [Argument] and [Function] can be placed in one table with lines of the "material -reaction -all cross sections" or "reactioncross section -all materials" type.Manipulation of the ta-ble in a dialog mode accelerates the selection of columns for plotting or for calculation of interpolated values.This justifies the presence of a relatively large number of empty fields in the table.
The obvious hierarchical structure of nuclear data is due to the fact that a certain value of a somewhat parameter corresponds, as a rule, a cortege of values of detailing quantities.A set of corteges is united into a table in which each record (cortege) is identified by the corresponding value of the given parameter.If a given parameter value (a cortege of values) is mapped to some multiple property (for example, energy distribution), this expresses a oneto-many relationship and defines the next level table.To associate this table with a pointed parameter, it is natural to renumber the values of the latter by adding a corresponding column, and make the table name from the parameter name and value number.Typically, these tables are combined into one containing the key field with the parameter number.
The numbering (indexing) columns are a simple and convenient tool for the navigation and search in the hierarchy of tables.Such hierarchy may be however very deep and the number of indexing columns will turn out to be comparable with the number of the columns storing meaningful information.To convolute the indices, in this case, it is appropriate to use the technique of numbering functions similar to the numbering function ZA = 1000 * Z + A (Z is the charge number and А is the mass number of the material) used in the ENDF format to name materials.For instance, this function can be interpreted as the designation of the number in the numeration system with the base Р = 10 3 which knowingly exceeds the amounts of potential values in the indexing columns and makes it possible to unite the suite of the values of M key fields into a single address number

N P e
The original suite of indices is restored using the known algorithms (Knuth 1997).

Infological model of the cross section file
The ENDF file MF=3 contains only cross sections of reactions and derivative values in the form of the function from the energy of an incident particle.This data defines a simple infological model containing three entities: Material, Nuclear reaction, and Cross section.The MAT material is described by two single attributes, the charge number and the mass number ZA and AWR, in the first heading line of the cross sections file MF = 3: [MAT, 3, MT/ ZA, AWR, 0, 0, 0, 0 ] HEAD [MAT, 3, MT/ QM, QI, 0, LR, NR, NP/ E int / σ(E)] TABl [MAT, 3, 0/ 0.0, 0.0, 0, 0, 0, 0] SEND Single attributes of the reaction MT are here the MAT material, the differences of the masses of the initial and end products QM and QI, and the product nuclear characteristic LR.Multiple attributes include the initial energy (incident particle energy) E, the cross section value σ(E), the interpolation interval, and the interpolation rule.Scalar attributes are placed in one table which is connected through the key columns [Reaction]  (Here and hereinafter, the infological formulas use standard identifiers of the ENDF format for designating the attributes in ENDF libraries).
The attributes NR and NP in the second heading line of the MF=3 file section are classified as metadata and are not included in the final version of the database.The tables are in the third normal form and, according to the known principles of normalizing relational databases (Datе 2004, Gray 1984, Martin 1977, Maier 1983), do not need further restructuring.

Infological model of the secondary neutron distribution file
The ENDF file MF=5 contains functions of the probability densities for the distribution of secondary neutrons by energy in reactions with neutrons and during fission.The major information block here is the subsection that stores parameters of a particular partial distribution for a range of values of the initial energy E. Due to a difference in the set of attributes in classes of distributions, it would be appropriate to consider exactly these classes as entities.In this case, the entity instance is the distribution of a particular class with the given value of the initial energy E.
All distributions are characterized by the probability of implementation with the particular value of the initial energy E for the given reaction.Besides, certain functions of the probability densities contain parameters also depending on the initial energy of an incident particle.It is convenient to save all this data in a single table if they are set on one energy grid.Otherwise, a separate table is required for each parameter.In particular, a table with the following structure is introduced for the probabilities of implementing the distributions The rest of the parameters depending on the initial energy (nucleus temperature, Watt spectrum parameters and others) are placed in tables with a similar structure.
A single attribute for the series of distributions is the constant U that defines the upper limit of the secondary neutron energy, 0 ≤ E* ≤ (E -U).It would be appropriate to store this constant in the table

MF5_U [Reaction, Material, Distribution type, U].
For the distribution of the type LF = 1, the set of values of the initial energy E and the table of the values of probabilities g(E → E′) are set as the functions of the secondary neutron energy E*, with interpolation enabled both by the energy E and by the energy E*.Evidently, this data can be stored in one Similar tables of interpolation parameters are provided for all distribution types.
The evaporation spectrum of the general form (LF=5) is defined by a set of the temperature values of the nucleus Θ(E) depending on the initial energy E and by a set of the values of the probability density functions g(x) where x = E*/Θ(E).This suggests the use of two tables the structure of which is described by the following formulas: Material, x, g(x), Interpolation interval, Interpolation type].
For the Maxwell spectrum (a simple fission spectrum, LF=7) and the evaporation spectrum (LF =9), the ENDF file stores only the temperature distribution for the nucleus Θ(E) so that one table with a structure similar to that of the table MF5_LF5_TETA can be enough for each spectrum.Three tables with a similar structure are used also to store the parameters a(E), b(E) of the Watt spectrum (LF=11) and the maximum temperature of the nucleus TM(E) which is present in the formula of the Madland-Nix spectrum (the energy-dependent spectrum of fission neutrons, LF=12).

Infological model of energy-angle distributions of products
The ENDF file MF=6 stores data on energy-dependent distributions of neutrons, photons, charged particles and residual nuclei emerging as nuclear reaction products.The product is characterized by the production cross section: (3) This formula defines the possible infological model based on three entities: The final version of the energy-angle distribution database which stores the values of distributions on rather a fine grid can be presented in such a form.This makes it possible to substitute calculations for sampling or to use, by default, linear interpolation in all intervals of the argument change.
In practice, the number of entities and respective tables is larger for storing the parameters of distributions in different presentations.Individual attributes of the product are selected from the heading line of the file MF=6: Due to the diversity of the additional product attributes, the product's standard identifiers [ZAP, LIP] adopted in the ENDF format are saved in such capacity in the final database version along with the expanded name.The data in table ( 4) is enough to describe the energy-angle distributions during isotropic emission of particles with discrete energy values (LAW=3), as well as during elastic scattering of charged particles (LAW=5).
The continuous energy-angle distribution (LAW=1) is characterized by two single attributes [LANG, LEP]  where the distribution parameters are interpreted according to the value of the indicator LANG.However, due to the limited number of parameters in the presentations used (Lagrange polynomials expansion or Kalbach-Mann systematic), it would be appropriate to interpret each parameter depending on the energies E, E* as a separate attribute.
In this case, the structure takes the form The data to describe the energy-angle distributions in accordance with the two-body scattering kinematics law (LAW=2) can be presented by infological formulas close to formulas (5), (6).Here, there is only no dependence on the product energy: Infological formulas for other files from ENDF libraries are built in the same way.

Conclusion
A substantial labor input in the conversion of ENDF libraries to relational databases noted in (ENDF-6 2009(ENDF-6 , 2011) may be reduced significantly as part of the following technology: -construction of adequate infological formulas for ENDF files; -use of conversion procedures written in built-in DBMS languages; -conversion directly in the DBMS environment into which the ENDF files are copied in advance in a line-by-line manner.
Infological models of ENDF files are built rather in an apparent manner which is confirmed by the examples given in the paper.It is reasonable to write infological formulas immediately based on defining relations of type (3).As a rule, a simple structure of relational tables and a semantically clear database pattern are set as the result.Where required, infological formulas contain attributes of metadata which are present in the initial information arrays.In particular, these may be all attributes gathered in the heading lines of the ENDF file's sections and subsections.
These attributes define the list of the relational table columns storing the values of the scalar properties of materials and reactions, as well as the respective list of tables for the scalar parameters of specific classes of energy-angle distributions.The main entity that requires a special table is a particular parametric presentation of a multiple attribute, the function that describes the dependence on the scalar or vector attribute.Accordingly, an instance of the entity is a suite of the values of the parameters of the functions with the given value of the argument.And the reaction, the material, the reaction product and the function argument act as the key attributes identifying the entity instance and ensuring the connection of data in different tables.
and [Material] to the table of multiple properties containing the columns [Incident particle energy] and [Cross section].The columns [Interpolation interval] and [Interpolation type] in this table indicate, for each energy point, the number of the interval containing the given point and the interpolation type in this interval.Therefore, the infological model sets the pattern of the database for the MF=3 file containing three tables the structure of which is described by the following formulas: Materials [МАТ, ZA, AWR], Reactions [MT,MAT, QM, QI, LR, NR, NP], Cross sections [MT, MAT, E, σ(E), Interpolation interval, Interpolation type].
MF6_LAW1_Line_distributions (Material, Reaction, Product, ZAP, LIP, Initial energy E, Product energy E*, NA, b 0 (E, E*), b 1 (E, E*),…, b NA (E, E*)].(6)This type lessens the duplication of the values in the fields [Initial energy E, Product energy E*].Saving the attribute NA (the number of angular parameters) simplifies the processing of the array into which the string of parameters is copied.
table with the following structure: