CONVERTING ENDF LIBRARIES INTO RELATIONAL FORMAT

The questions of converting the constants systems in the ENDF format into a relational database are considered. Such conversion may be one of the tools that simplify the development and exploitation of factual information, techniques and algorithms in the field of nuclear data, and consequently, increases the efficiency of the corresponding computational codes. The paper briefly reviewed the infological model of libraries ENDF. Is described the possible structure of the relational tables in the database. The proposed database schema and the form of tables takes into account the presence of both single and multiple properties of the considered isotopes. Is taken into account also the difference of requirements for transmission organization of relational tables into the programs and the organization of the visual analysis of data in the tables physicist appraiser. The algorithms of conversion and the results of conversion in relation to ROSFOND(cid:31)A libraries and ENDF/B(cid:31)VII.1 is described. Outlines the advantages of performing calculations directly in the DBMS environment to simplify programming and to exceptions the necessity of solving a number of tasks for verification and validation of data. Some possible approaches to ensure exploitation of legacy software in conjunction with the relational libraries of the constants is listed. Proposed some terminological clarification to facilitate the infological models construction for ENDF format. The conversion programs and library ENDF/B(cid:31)VII.1(cid:31)neutrons in the relational format posted on the public site http://178.215.91.20/nd


Introduction
Much attention has always been paid to the issues of organizing machine-readable data for neutronic calculations (see, for example Kolesov and Nikolaev 1972, Parker 1963, Woll 1968, Drake 1970, Nikolaev et al. 1984, Pronyaev et al. 2001, MacFarlane and Muir 1996, Larson 2007, ENDF-6 2009, Mattoon et al. 2012, Abramovich et al. 2001, Zizin et al. 1974, Sinitsa and Rineiskiy 1993, Koshcheev et al. 2000, Plyaskin and Kosilov 2002, Manturov et al. 2000, Zhuravlev et al. 2009, Koshcheev et al. 2014, Manturov and Nikolaev 2016).Data location in a library of constants (data format) significantly affects the calculation efficiency, determining the speed and, at times, the accuracy of calculations.Analyses of neutronic calculation programs show that up to 60% of the code is intended for the implementation of data management functions.The way data are organized is also important for physicists-evaluators, making possible operative sampling, taxonomy, visualization and comparison of data.
The most common text file format for evaluated nuclear data files (ENDF) became a de facto standard for historical reasons.It reproduces the punch-card and tape data organization, which determines the order of access to information and introduces appropriate restrictions on the style of programming and data processing.Efforts aimed at developing and standardizing nuclear data formats in traditional text file technologies do not lead to radical solutions in terms of reducing the data management cost, since the emergence of new experimental information and the needs of applied problems necessitate introducing new types of data that require format changes and new specialized software (Manturov et al. 2000, Zhuravlev et al. 2009, Koshcheev et al. 2014, Manturov and Nikolaev 2016).
The volume of factual information, techniques and algorithms in the field of nuclear data is very large now.Therefore, the task of finding means to simplify the development and operation of this information array is very urgent.Using a technology, where the search, retrieval and updating of information operations are standardized invariantly to the nature of data, would help reduce the labor of programmers, focus on the functional (applied) part of the calculation codes, and improve readability (self-documentability) and program verifiability.Currently, such a technology is provided by relational database management systems (DBMS).One of the main concepts of this approach is that the physical and logical organizations of data are separated.Centralized database management provides standard low-level data operations, eliminating the need to program these operations in specific applications.
The paper briefly describes the concept and technology as well as the programs and results of converting the ROSFOND-A and ENDF/B-VII.1 libraries to relational format.The conversion programs and ENDF/B-VII.1 library in the relational format are available on the public site: http://178.215.91.20/nd.

Prospects of using relational DBMS in neutronic calculations
There is a positive experience of using relational databases in libraries of nuclear constants (Boboshin et al. 1994, Boboshin et al. 1999, Boboshin et al. 1999a, Varlamov et al. 2001, Varlamov and Ishkhanov 2017).However, it seems that the DBMS technology, which provides "standard tools for solving standard tasks" of data management, should be used more widely.The metaphor of a table for representing data in a program is a very powerful tool that determines the style and effectiveness of programming.A tabular form of relational databases is much better suited to the task of organizing variant calculations (as well as verification and validation tasks (Yuferov et al. 2013b)) than traditional data structures.
Storing data in a relational database eliminates the need for an explicit allocation and checking of control information (separating records, navigation parameters, data type indicators, counters, flags, etc.).Correct data positioning is provided based on the information principle of object descriptions, according to which each object is assigned a table row, i.e., a unique tuple of pairs <object property name, property value>.This eliminates the task of developing special manuals for organizing and formatting data as well as saves time for studying control information and practicing its correct application in processing programs.
The structure of relational tables is determined only by the nature of specific physical information.Table columns store homogeneous data, for example, total cross-sections, and each entry (table row) is identified (in this case) by the corresponding energy value.Adding records (for example, when expanding or specifying an energy interval) is a standard operation that does not require correction or inclusion of any control information.Similarly, adding columns (for example, with data uncertainty estimations) does not necessitate changing the query modules that retrieved data from the table of the original structure.
When performing calculations directly in the DBMS environment, programming is simplified as follows: -the stages of data input and processing are divided; -the task of formatting input/output data is excluded; -the validity of source data is verified outside the processing modules at the time of online input into the tables; -data addressing is performed "by name" without monitoring their actual allocation in the external or random access memory; -optimized search algorithms in the DBMS significantly speed up data sampling; -sequences of keystrokes and queries can be recorded and saved for repeated execution or for insertion into the program text; -database tables for source data and results can be used in modern programming languages based on standard data access technologies.
There are no fundamental problems and shortcomings caused by transferring libraries of constants to the environment of relational databases.Back in 2001, the report (Pronyaev et al. 2001) formulated the goals and ways of converting all libraries, maintained by world nuclear data centers, to the relational format.It was assumed that the nuclear data centers will develop standard database diagrams, the Java language will be adopted as a common programming language and in about five years the nuclear data libraries will migrate to the format of relational tables.As a result, both local and remote access was provided through a single interface to both data and processing facilities.However, for various reasons, the declared goals have not been fully achieved to date.Thus, when developing the seventh version of the ENDF/B library ( 2006), it was stated that the transition to relatio-nal formats would be advisable, but the proposed changes were considered premature due to considerable labor costs (ENDF-6 2009).Therefore, ENDF format remains the main form of storing information in evaluated neutron data libraries.
Converting ENDFs to relational format represents a problem not due to the great complexity or even the complex branched structure of the data in question, but because of the many widely used programs oriented to processing data in ENDF format.Nevertheless, the objective need to provide diverse works with very large arrays of textual and numerical information on nuclear physics promotes the development of new formats designed to standardize and unify data access.In particular, the development of a unified nuclear data format (GND -Generalized Nuclear Data (Mattoon et al. 2012)) is completed based on the extensible markup language (XML).The CINDA bibliographic databases and the EXFOR experimental databases are converted to the relational format (Varlamov et al. 2001

, INDC(NDS)-0614 2012).
Work is underway to transfer the files of ENDF-formatted data and group constants for neutronic calculations to local and client-server DBMS (Fan et al. 2005, Alekseev et al. 2012).

Constructing an infological ENDF model
An infological model describes the subject area in some standard terms and notations for later mapping this description into a relational database schema, i.e., a list of specially structured tables and their relationships.
To describe the database schema for ENDF libraries is rather difficult due to some terminological contradictions with the standard notions of informatics.In the ENDF system, constants are grouped according to materials as text files with lines that formally refer to two structural levels.The major structural element of such a file is also called 'file' (in the ENDF system).Therefore, for definiteness, a text file at the library level (at the level of the operating system) will be referred to as a material file, and the top-level section in a material file as an internal file.
An internal file is treated as a section with "data of a particular class".However, the concept of a 'class' (implying a fixed list of predicates, i.e., conditions and attributes characterizing a certain class) is not strict here -the first lines of the section contain some header information (in particular, the ID of an internal MF file), supplementing the main content of the section.This content is divided into sections, i.e., second-level structural elements, which (in ENDF terminology) "describe a certain type of data".The main content of a section is also preceded by the header entries.Thus, there are actually four structural elements: the header and the first-level data, the header and the second-level data.The headers contain identifying or controlling information and some specific material pa-rameters.In the lines of both headers and sections for data placement, ten fixed-length fields are allocated: P1, P2, P3, P4, P5, P6, Material, File, Section, Line, of which the last four -Material, File, Section, Line -are standard, and the first six store various information content.The semantic content of neighboring lines can be different.This is the difference between the ENDF format and the relational model, in which the meaning of all records (lines) of the table is identical and determined by the list of properties considered for a certain class of objects represented by table entries.Values of the properties of a particular object are stored in the record fields associated with the corresponding table columns.
This subject area contains three main classes: MA-TERIALS, INTERACTIONS (REACTIONS), and DE-CAYS.All other information can be treated as a set of properties of the specified entities.Allocation of other entities in the subject area, i.e., some subclasses, for example, ISOTOPES or RESONANCE INTERACTIONS, is caused by practical conveniences of operating relational tables.Thus, the resulting infological model for a file of resonance parameters includes the following entities: -a class of materials; -a class of isotopes; -a class of interactions identified by an energy interval of incoming neutrons; -a class of interactions, the determining attribute of which is the orbital angular momentum value; -a class of interactions identified by the target nucleus spin; -a class of resonant interactions (resonances).
The list of classes defines a database schema.Each class has a corresponding table with recorded scalar characteristics of the class instances, i.e., material, isotope, interaction, and resonance.

Organizing tables of constants in relational databases
Data are distributed throughout the tables according to the nature of their properties.A property can be single (scalar, e.g., a mass number) or multiple (e.g., a vector for counting the cross-section energy dependence).All single properties inherent in all materials can be placed in one table.Specific properties characteristic of some materials (e.g., fissile materials) are grouped into separate tables to exclude gaps in the table of general properties.
Multiple properties are expressed by functional dependencies (temperature, energy).If the values of a function are taken for all materials at the same points of the argument, it is advisable to treat these points as single properties (e.g., a cross-section in a given energy group), assigning a single column of the table to each point of the argument (e.g., to a group).As a result, the table is filled as densely as possible.For example, such a table can store all the main group cross-sections.In this case, the first column is reserved for the material name, the second column -for the section type, and the remaining column -for the corresponding group values.Currently, the permissible number of table columns can be several thousands; therefore, such storage is possible for a multi-group presentation of constants.
The described "horizontal" arrangement of groups (each group in a separate column) is convenient for a one-time sampling of all data on the energy distribution of cross-sections -the contents of the current record is transferred to an array by one COPYTOARRAY instruction.On the other hand, according to the basic concept of arranging tables (one column for one property), "automatic" output to the graph is carried out column by column.Therefore, in order to promptly put out the energy distribution to the graph, the groups should be placed "vertically", i.e., data of a particular group are in a separate record, and the column sample gives cross-sectional values for all the groups.
These two processing tasks are typical; therefore, it is advisable to store the tables in two forms, i.e., with vertical and horizontal records of energy dependencies.In terms of maintaining the databases and the amount of necessary software support, this approach is not burdensome.
If the sets of argument values are not the same for different materials, the functional dependence can be stored only "vertically", allocating for it two columns: the first one stores the argument values, the second one contains the functions.If such pairs are placed contiguously, then there will be gaps in the table due to the different number of values in the sets.If they are placed consecutively, there will be no gaps, but it will be necessary to provide a field for the dependency name.
In ENDF libraries, many efforts are made to uniquely identify the material by means of four-digit numbers.This is due to the currently artificial "punch card" limitation of the recording length by 80 characters.It is advisable to use the common chemical notation to identify, for example, hafnium isomers: these are the identifiers 178Hfg, 178Hfm2 and 178Hfn or a form of Hf-178-m2 type that is more convenient for visual perception and computer processing.Adding the element symbol eliminates the need to memorize the mass numbers and charges to identify materials, especially since conditional numbering is still used for compound materials (water, zirconium hydride, etc.).All used names are stored in the reference table associated with the material identifier field.This field is filled by selecting the required value from the reference table, which excludes the possibility of typing a random incorrect name.
The total number of tables is determined by the actual number of "material-internal_file-section" combinations and the presence of single and (or) multiple properties in the section.Thus, the ROSFOND-A library contains 301 files of materials with internal files of seven types: MF=1 (general information), MF=2 (resonant parameters), MF=3 (reaction cross-sections), etc.In addition to physical data, each material file contains historical, navigational and statistical data.The practice in operating DBMS shows that such data should be stored in separate tables.In total, the ROSFOND-A library has 4664 "material-file-section" combinations, each of which can be designed as a "material-property" table.Such a maximally fractional presentation can be convenient for evaluators analyzing a particular reaction.
To support the calculation tasks, larger file-section groupings are preferred: their number in the ROS-FOND-A library is 70.The table of existing "file-section" groupings is included in the database as navigation data.The main sections of these groupings combine data that are homogeneous in terms of physical meaning and can be represented by separate tables.Thus, all 25 sections of File 3 contain energy dependences of cross-sections and derived quantities.Therefore, all the information, even when specifying an individual dependence of each cross-section on energy, can be placed in one table.On the other hand, the "2-151" grouping (File 2 containing a single Section 151) includes parameters of resolved and unresolved resonances.The structure of this information is rather complicated and, therefore, it is advisable to use several tables.
In general, information of each grouping is divided into two types of tables with data on single and multiple material properties, respectively.For convenience of analyzing and selecting operational data, the tables are stored in two formats: "material-all_properties" and "property-all_materials".For correlating with ENDF files, the name of the "property-all_materials" table is MF_MT, i.e., it is formed from the ID of the internal MF file and the identifier of the MT section.For example, the "03_001" table contains complete neutron cross-sections of all materials.Similarly, the table name of the "material-all-property" type is formed as ZZAA_MF from the ZZAA material cipher and the ID of the internal MF file.Over time, this conditional identification of tables should be replaced with meaningful reasonable naming, which will allow non-specialized users to navigate in the database schema and provide some self-documentation of processing programs.

Procedure for converting ENDF libraries to relational databases
Although ENDF format uses a large number of different flags and indicators, specific information units within sections do not have any special features.The required unit can be found only by reading all the records in the section and calculating its initial line by the indicators of lines allocated for information units.If there were individual unit characteristics, converting ENDF libraries to relational format would be a trivial task.
ENDFs can be converted to relational tables by various means.This conversion is actually a one-time operation; therefore, there is no need to design a special conversion program stored as an .EXE module.It is more appropriate to use the built-in scripting language of a given DBMS, performing the conversion step by step.It is advisable to import the source text files of a nuclear data library line-by-line into the database "as is", and then to select step-by-step the necessary fragments of lines, creating new tables or columns for their placement.The relational tables make it possible to visually evaluate the results of each stage and decide on the path of further conversions.
The scenario for converting ENDFs to a relational database includes the following steps: 1. ENDFs are imported line-by-line into relational tables (summary or individual for each file) as text lines.
To visually control the correctness of conversion, six information columns are accompanied by identification columns, in the fields of which ENDF flags are written at the data semantic analysis stage.
3. From the tables obtained in Stage 2, tables are selected that combine the data of sections across all materials, for example, the "02_151" table contains data of the MT=151 section from the MF=2 file of resonance parameters.
4. Navigation information tables for sections are created and filled: for example, the "List_ER" energy table, in which all the scalar parameters (flags and physical quantities) characterizing the energy interval are stored (for example, in different representations of resonance parameters).
5. Information tables are created and filled for a section.Thus, the "SLBW_MLBW" table stores the scalar parameters of resolved resonances in the single-level and multilevel Breit-Wigner approximations.If necessary, all the scalar parameters of different representations of resonances can be placed in a single table.This possibility is limited only by the permissible number of table columns in a particular DBMS.
5.1.Information tables are created and filled for energy-dependent parameters.Tables of such functional dependencies have a simple structure of one or more tuples of the form {Isotope}, {Energy Interval No.}, {Energy}, {Parame-ter}, {Interpolation Scheme}.
It is noteworthy that even the second of the listed steps would solve the conversion problem, which is understood as providing direct logical access to data by their names, if all information units had semantic indicators or flags of the unit heading and end.However, the ENDF format provides only availability flags and data quantity indicators, offering only sequential access.That is why the conversion task turns out to be non-trivial, requiring that an algorithm be developed for a sequential analysis of all the ENDF lines to determine the data availability and quantity flags.
To ensure verification and control of data, the created relational tables first save all the ENDF navigation parameters.Conversion scripts use these parameters to search for information units in the summary table and generate detail tables.At the stages of further restructuring, the database schema is optimized: in particular, the flag fields are excluded.For example, it is not difficult to see that the LRU flag indicating which resonances are described in a given energy interval can be eliminated, since all situations can be indicated by the LRF flag if the set of its values is supplemented with the following values: LRF=0, only the scattering radius is determined; LRF=8, unresolved resonances are described in the current energy interval, with only mean fission widths depending on the energies; LRF=9, unresolved resonances are described in the current energy interval, the mean distances between the level, the widths for competing reactions, the average reduced neutron widths, the radiation widths, and the mean fission widths depend on the energy.
A complete normalization that meets all the theoretical requirements for eliminating redundancy is not done, when operating a database, so as to avoid creating complex queries that slow down data sampling.
Materials illustrating the stages and results of implementing the scenario presented above (for ROSFOND-A and ENDF/B-VII.1-neutronslibraries) are given in (Yuferov et al. 2013b, Yuferov 2011, Yuferov et al. 2013, Yuferov et al. 2013a)  Placing nuclear data in a relational database can be considered as a form of presentation of the original format and applied along with the latter.Due to this, it is possible to preserve the accumulated software tools and apply the DBMS technologies for organizing calculations in both existing and new neutronic calculation complexes.This provides additional means to verify the libraries and codes.3. It is not definitely necessary to develop standard schemas for databases, the need for which was noted in the report (Pronyaev et al. 2001).On the one hand, the "standard" is defined by the logic of nuclear data interconnection and subordination but, on the other hand, the variety of possible tasks does not allow us to determine the only optimal DB schema.Using a relational model, it becomes possible to easily adapt the available database to any newly created problem by selecting and arranging the necessary data or by restructuring the tables.Certain standardization should be envisaged only to facilitate replication and synchronization of databases of world nuclear data centers.4. Also, it is not necessary to use a common programming language for nuclear data libraries (the report (Pronyaev et al. 2001) suggested the Java language).The current situation is such that the language can and should be selected "for the task" and any language can use standard technologies of local or remote access to databases.5.The structure of tables should be dynamically optimized for a specific calculation task.The variability of the tabular layout requires comparison of possible database schemas by performing computational experiments to evaluate the efficiency of working with relational presentations of ENDFs by the criteria of visibility, convenience of analyzing, and data sampling rate for various tasks and software packages.
and on the site: http://178.215.91.20/nd.There are no fundamental problems with nuclear data migration to the environment of relational databases.