Corresponding author: Viktor I. Belousov ( mirror08@yandex.ru ) Academic editor: Georgy Tikhomirov
© 2021 Anastasiya V. Shoshina, Viktor I. Belousov.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Shoshina AV, Belousov VI (2021) Speeding up the ODETTA code for solving particle transport problems. Nuclear Energy and Technology 7(1): 15-20. https://doi.org/10.3897/nucet.7.64365
|
Mathematical simulation of fast neutron reactors requires high-precision calculations of protection problems based on unstructured meshes. The paper considers and analyzes a parallel version of the ODETTA code (
Parallel programming, MPI, ODETTA code, finite element method, radiation safety, HPC cluster
The purpose of the study is to use parallel computations for solving problems of neutron and gamma quanta transport in a multi-group SnPm approximation by the finite element method based on unstructured tetrahedral meshes, including mesh data handling. The study is conducted as part of an investigation by the Nuclear Safety Institute of the Russian Academy of Sciences (Nuclear Safety Institute of the Russian Academy of Sciences) at MEPhI’s computation center. Extremely complex problems are involved in mathematical simulation of fast neutron reactors. It is exactly the solution of such problems that requires high-precision mass calculations of protection problems based on unstructured meshes. An analog is the ATTILA code (
Mathematical simulation of fast neutron reactors requires highly precise calculations of protection problems using unstructured meshes. The ODETTA code in the neutronic calculation implementation module simulates the solution of the transport equation by the finite element method (FEM) based on unstructured tetrahedral meshes in a discrete ordinate method approximation (
With the use of discrete ordinates, the transport equation is written as
where the group index number g is omitted, and the index m (m = 1, 2, …, M) is matched by the discrete direction Ωm = (mm, hm, xm) out of a quadrature set (with equal weights ESn or Chebyshev-Legendre’s CLn), the unit sphere surface value being measured in 4π, i.e. Σωm = 1. FEM formulas are developed by approximating the transport equation according to Galerkin using the weighted residuals method. The equation includes the full macroscopic interaction cross-section ΣT and the Qm function, the right-hand part of the transport equation depending on the solution of Ψm. The right-hand part includes the source of intergroup and intragroup transitions, the source of fissions, and the given internal source. Zero values of the angular flow are given on the boundary G of the considered 3D region for directions inside of the region or the reflection condition. This results in a boundary problem for solving the equation of particle transport in a convex 3D region.
The anisotropic scattering is represented by a series expansion in associated Legendre functions up to the fifth order. The spatial rebalance method is used to accelerate the convergence of internal iterations. The code has been developed in Fortran (standard Fortran 90 and later) using OpenMP parallelizing. The ODETTA code operation steps are as follows:
The SALOME code is used to handle CAD models and unstructured meshes (
The solution obtained after the ODETTA simulation can be visualized by a 3D graphic plotter, e.g., the VisIt program (
The spatial region under consideration is decomposed into a finite number of elements with the fixed number of endpoints referred to as nodes. The accuracy of the results depends greatly on the decomposition quality. Tetrahedrons are used as elements for the calculation. They have common nodal points and, taken together, approximate the region shape. Normally, decomposition is started from the region boundaries so that to obtain as accurate approximation of the region shape as possible (Fig.
The considered region and the tetrahedrons can be decomposed by directions. The unit sphere of directions (an angular variable mesh) is decomposed into eight octants (Fig.
Decomposition by directions (Fig.
The growing interest in parallel programming these days is explained by the transition to mass production of multicore architectures (
The ODETTA code uses a Fortran source code like most of the codes currently used in nuclear industry. Many of them cannot use more than one process, so they require conversion to parallel computations to speed up their operation. There are two ways to do this.
The first one is to use a compiler in which case the developer needs to indicate the code region where exactly parallelizing should be used inside of the program text. Such approach is used in the OpenMP system but is possible only in shared memory systems. This parallelizing option was introduced earlier in the ODETTA code.
The second approach suggests that the developer as such specifies the distribution of and communication among processes in the program code. The second approach was used in the new version of the ODETTA code. This section describes the alterations made to the ODETTA source code for its new implementation with the MPI technology.
Decomposing the space into eight octants, the calculation in which can be done independently, makes it possible to use several computation nodes to optimize the code runtime. The ODETTA calculation algorithm is executed in parallel, and, theoretically, the more nodes are used per cluster, the shorter is the problem simulation time. The program’s source code used one computation node, and the octant calculation was done in series.
The resultant program was tested using two and four nodes of MEPhI’s cluster, though the eight-node cluster was assumed to be the best option. The use of the MPI cuts the calculation time considerably since all computations are done in parallel. Different numbers of nodes and their respective distributions of processes are shown in Fig.
Computations were supposed to be done using eight octants, so a maximum of eight processes, distributed by the nodes requested on the cluster, were used for the code implementation. The parallel computation capability is expected to provide a distinct advantage in terms of the calculation speed.
Directives controlling the compiler operation were engaged in the ODETTA code implementation using the MPI, this making it possible to extend the code capabilities.
The final ODETTA algorithm using the MPI is represented by a block diagram with standard symbols commonly used in structured programming, including oval (the beginning or end of the considered program unit), rectangle (operations block), hexagon (the algorithm cycle containing the “body” of repetitive operations, one input, and one output (Fig.
The MPI use blocks in the implemented algorithm are highlighted with a grey background and an White Italic text. Changes pertain largely to the use of data in the interaction between parallel processes.
A radiation safety problem, which modeled a fuel assembly of the MOX-1000 fast neutron reactor benchmark model with mixed oxide uranium-plutonium fuel and sodium coolant (
Stage | Simulated item | Problem solved | Input data | Output data |
---|---|---|---|---|
Calculation of shielding | Fuel assembly in a steel container (shell) filled with lead | Non-homogeneous problem with given source | 299-group source of neutrons, 127-group photon source, geometrical and physical parameters | Full neutron flux, full gamma quanta flux, calculation time |
Comparison of results | ODETTA with MPI and without MPI |
It was assumed for simplifying the model calculation that the cylindrical container shell was made of HT-9 steel. SnPm approximation with the parameters n = 12 and m = 3 was used for the problem simulation which means that there is a total of 288 directions for the unit sphere or 36 directions per one octant (CL12 Chebyshev-Legendre quadrature). The number of energy groups for neutrons is 299 and of those for gamma quanta is 127. The number of tetrahedrons in the decomposition mesh is 132883.
The software for the ODETTA code was implemented at MEPhI’s high-performance computation center the clusters of which are Linux-based. The remote computer is controlled through the command line and using the SSH protocol through the Putty program (
Number of processes | 1 | 2 | 3 | 4 |
---|---|---|---|---|
Time, h | 10.036 | 8.922 | 10.824 | 9.372 |
Speed up | – | 1.125 | 0.927 | 1.071 |
Efficiency | – | 0.562 | 0.309 | 0.268 |
Number of processes | 1 | 2 | 3 | 4 |
---|---|---|---|---|
Time, h | 13.435 | 8.809 | 12.103 | 9.579 |
Speed up | – | 1.525 | 1.110 | 1.403 |
Efficiency | – | 0.763 | 0.370 | 0.351 |
The speed up Sp = t1/tp and the efficiency Ep = Sp/p for the parallel algorithms (t1 is the algorithm execution time for one process, tp is the algorithm execution time for a system of p processes) are determined depending on the number of processes.
The radiation safety test problem was calculated using computational nodes numbering one to four, the use of one node being equivalent to the problem simulation without the use of the MPI, and the number of OpenMP processes for each process was equal to 32, that is, the maximum one for one node. Due to the cluster overload, there was no eight-node computation, which had the maximum efficiency of problem solving in terms of time. Since the clusters use different systems of processors, it can be noted that the problem runtimes are different with the best one being in the event of the Basov cluster, as the program runtime with four nodes is longer than with two due to the specific data transmission between the cluster nodes. The ODETTA algorithm with the MPI requires exchange of large data arrays between the cluster nodes, so data transmission leads to an additional calculation delay. An additional speed up was however expected with eight cluster nodes thanks to processors which would minimize the costs of the node data exchange thanks to their speed.
Comparison based on calculation results for full particle fluxes with the Basov cluster
Number of processes | 2 | 3 | 4 | |
---|---|---|---|---|
Neutrons | δmax, % | 1.26∙10–3 | 9.31∙10–3 | 1.55∙10–3 |
σ | 4.52∙105 | 1.48∙105 | 8.21E∙105 | |
Gamma quanta | δmax, % | 1.49∙10–3 | 1.07∙10–3 | 1.45∙10–3 |
σ | 2.39∙104 | 7.57∙103 | 3.92∙104 |
A comparative analysis for the accuracy of computing full fluxes of particles is presented in Tables
δmax = max(δ1, δ2, ..., δn), δi = 1 – | xi /x*i | = | Δxi / x*i |,
where n is the number of the solution points; xi is the calculated solution; x*i is the reference solution; and Δxi = xi – x*i. The root-mean-square deviation from the reference solution is
.
The relative error during the calculations with two and four nodes do not exceed 0.00155% when the Basov cluster is used and 0.00204% when the Cherenkov cluster is used, this indicating a minor deviation from the reference result.
Comparison based on calculation results for full particle fluxes with the Cherenkov cluster
Number of processes | 2 | 3 | 4 | |
---|---|---|---|---|
Neutrons | δmax, % | 2.03∙10–3 | 1.48∙10–3 | 1.49∙10–3 |
σ | 1.82∙106 | 1.56∙106 | 1.99∙106 | |
Gamma quanta | δmax, % | 2.04∙10–3 | 1.47∙10–3 | 1.50∙10–3 |
σ | 1.10∙105 | 7.84∙104 | 1.09∙105 |
The best option for obtaining the ODETTA cluster-based computation results is the use of one node with two MPI processors and 16 OpenMP processors installed for each process. This is possibly explained by the peculiarities of the cluster node structure which allow the program parallelizing algorithms to achieve the maximum speed up in terms of the calculation rate. The advantages of this computation option are shown in Tables
Cluster | Basov | Cherenkov |
---|---|---|
Time, h | 6.951 | 7.029 |
Speed up | 1.444 | 1.911 |
Efficiency | 0.722 | 0.956 |
ODETTA computation characteristics using one node with two MPI processors
Cluster | Basov | Cherenkov | |
---|---|---|---|
Neutrons | δmax, % | 1.34∙10–3 | 1.80∙10–3 |
σ | 3.16∙105 | 1.75∙106 | |
Gamma quanta | δmax, % | 1.25∙10–3 | 1.96∙10–3 |
σ | 1.69∙104 | 9.65∙104 |
The relative error during the calculations with one node using two MPI processors does not exceed 0.00134% when the Basov cluster is used and 0.00196% when the Cherenkov cluster is used, this indicating high precision of the obtained results.
The MPI technology was used to speed up the ODETTA code for solving the neutron and gamma quanta transport problem in a multigroup SnPm approximation by the finite element method based on unstructured tetrahedron meshes. The results of the source code modification were tested using a radiation safety problem as an example.
The program modifications were tested using a radiation safety problem (
When analyzing the starts of computations using different numbers of the MPI nodes and 32 maximally possible OpenMP processes, a conclusion can be made that the most efficient distribution was that by two Cherenkov cluster nodes where the speed up was about 1.52. The calculation was slower with four computation nodes used than with two, this being explained by the peculiarities of the algorithm and the cluster architecture, that is, using the Hyper-threading technology (
As the result, the use of one node with two MPI processors and 16 OpenMP processors was the best option for the ODETTA cluster operation. The maximum speed up achieved with the Cherenkov cluster was 1.911 (nearly double). The maximum time gain amounted so to about 48%. In turn, the minimum time gain was about 30% as compared even with the best one-processor calculation.
It should be additionally noted that the ODETTA algorithm with the use of the MPI requires large arrays of data to be exchanged between the cluster nodes, so the transmission of data led to a major calculation delay. The use of several processors has a negative impact on speed due to an extra interprocessor interaction, but the calculation time remains much smaller as compared with the serial version.