Corresponding author: Iurii D. Katser (Iurii.Katser@skoltech.ru)
Academic editor: Georgy Tikhomirov
The main tasks of diagnostics at nuclear power plants are detection, localization, diagnosis, and prognosis of the development of malfunctions. Analytical algorithms of varying degrees of complexity are used to solve these tasks. Many of these algorithms require pre-processed input data for high-quality and efficient operation. The pre-processing stage can help to reduce the volume of the analyzed data, generate additional informative diagnostic features, find complex dependencies and hidden patterns, discard uninformative source signals and remove noise. Finally, it can produce an improvement in detection, localization and prognosis quality. This overview briefly describes the data collected at nuclear power plants and provides methods for their preliminary processing. The pre-processing techniques are systematized according to the tasks performed. Their advantages and disadvantages are presented and the requirements for the initial raw data are considered. The references include both fundamental scientific works and applied industrial research on the methods applied. The paper also indicates the mechanisms for applying the methods of signal pre-processing in real-time. The overview of the data pre-processing methods in application to nuclear power plants is obtained, their classification and characteristics are given, and the comparative analysis of the methods is presented.
Modern nuclear power plants (
Such data contain valuable information about incipient faults, but it can be extremely difficult to use the so-called raw or unprocessed data in analytical algorithms. The algorithms of fault detection, pattern recognition, fault localization, prognosis of fault development, etc. require signal pre-processing for high-quality output. The pre-processing techniques include both machine learning methods (
The pre-processing stage is very important in detection algorithms. Its relevance seems rather evident since it is an integral part to the overwhelming majority of the methods mentioned in this overview and other reviews of data processing methods (
Taxonomy of data pre-processing methods.
Fig.
Equipment diagnostics loop.
The main path of the equipment diagnostics is the sequential execution of all stages, starting with data acquisition, followed by pre-processing, fault detection, localization, diagnosis or root cause identification and prognosis of how the detected faults may develop. The dashed line indicates an auxiliary path of equipment diagnostics, in which the stages do not follow from one another. The auxiliary path can be taken either in deferred analysis when any stage is considered separately from the others; or when using the original data in its unprocessed form or adding new data at any stage; or in other pre-processing methods to prepare the original data and thus ensure algorithm operation.
It is necessary here to clarify some of the terms used in this article. The
This article focuses on the Data and Pre-Processing stages, traced with heavy line in Fig.
An
• geometric quantities (measurements of length, position, angle of inclination, etc.);
• thermotechnical quantities (temperature, pressure, flow rate, volume of working fluid);
• electrical quantities (current, voltage, power, frequency, induction, etc.);
• mechanical quantities (deformation, forces, torques, vibration, noise level, etc.);
• chemical composition (concentration, chemical properties, etc.);
• physical properties (humidity, electrical conductivity, viscosity, radioactivity);
• parameters of ionizing radiation (radiation fields inside and outside of zoned fluxes of neutrons and gamma radiation);
• other parameters.
Most of the generated and aggregated signals relate to the raw data and represent time-series type of data. Asynchronous generation and acquisition of data present a problem in data analysis. Malfunctions of measurement channels result in data omissions, inaccurate readings and noise contamination. Moreover, self-monitoring or self-diagnostic systems of measuring equipment can either detect invalid values or skip them. However, various pre-processing methods make it possible to minimize the impact of such factors on the quality of technical diagnostics.
In general, the Pre-Processing stage consists of the four main steps shown in Fig.
The Data Cleansing helps eliminate invalid values and outliers by removing or correcting them. At this stage, either the missing data are filled in, or the data objects containing such gaps are deleted if their share is small. The features with a large number of data gaps or invalid values can also be excluded from further analysis.
All measurements affecting
Data gaps appear due to the imperfection of modern measuring systems, communication channels and other infrastructure. This poses a problem when working with anomaly detection methods and other techniques. The simplest approaches here are to ignore features with gaps or replace the gaps with specially assigned values, for example, 0 or −1. Also, missing values can be filled in by standard methods, such as the moving average or median over the selected window; the average (quantitative characteristic), mode (categorical characteristic) or median value over the entire time series; and the last value obtained before the gap. Alternatively, there are advanced methods to fill in missing data, for example, the machine-learning methods (for regression, see
To tackle the problem of outliers, one can either apply conventional methods, for example, remove values that contradict the laws of physics or fail to meet the standard deviation of a feature, or resort to modern methods of data mining and machine learning. However, in most cases, the problem of finding anomalies in data is an unsupervised learning task and hence it is suggested to use the class of unsupervised learning methods. In his textbook on models for detecting outliers and anomalies,
1. extreme value analysis;
2. clustering;
3. distance-based models;
4. density-based models;
5. probabilistic models;
6. information-theoretical models.
Another approach to solving the problem of outlier detection is the use of ensembles (
Turning now to support vector machines (
Isolation Forest, or iForest, identifies outliers by the low depth of outlying values in the constructed tree (
Cluster analysis is the process of categorizing a set of objects into groups (clusters) so that objects in one group are similar by some of the attributes. The study by
Let us now consider minimum covariance determinant (
At the Feature Transformation stage, the transformation affects the features values (scaling, change in the sampling rate), their type (categorization of discrete and continuous values), modality (videos are converted into a sequence of pictures, pictures into tables of numerical data), etc.
Most of the pre-processing algorithms require input data, the features of which are on the same scale, since the mean value and variance of features impact their significance for algorithms (
• Linear:
◦ Z-Normalization/Standardization: normalizing the mean to 0, standardizing the variance at 1;
◦ Min-Max Normalization: rescaling the range of features to bring data to the 0 to 1 scale, with zero corresponding to the minimum value before normalization and one corresponding to the maximum;
◦ Normalization by Decimal Scaling: to moving the decimal point of values of feature, take i digits of the maximum value of a time series and divide each value of the series by 10 i ;
◦ MaxAbs scaling: normalizing each time series value to the maximum absolute value of the entire series;
• Non-linear:
◦ Hyperbolic tangent: scaling the values to [−1…1];
◦ Logistic (sigmoid) function: scaling the values to [1…0].
In addition to scaling, the Box-Cox transformation (taking of logarithm) is often applied to features (
Another important problem is to bring signals with different sampling rates to a single one. In their monograph,
• reducing the sampling rate of all processes to the minimum;
• increasing the sampling rate of all processes to the maximum;
• converting to an intermediate or any other sampling rate.
The choice of a specific rate, which all signals must be converted to, should be based on the characteristic rate of the analyzed process and be consistent with the subsequent stages of diagnostics. A significant decrease in the rate can lead to the loss of information in the signals while an unreasonable increase in the rate can affect the computational complexity of subsequent data analysis processes.
Firstly, now that the machine learning methods are gaining popularity, including due to the ability to work with Big Data, sometimes it pays to bring signals to a low frequency to reduce the total computational complexity of the problem. It also may be necessary to reduce the sampling rate if the set of sequentially applied methods is large, to be able to solve problems in real time.
Secondly, the monograph missed an important point of applying the above approaches in the real time mode. Since interpolation is not applicable in real time mode (in the pointwise analysis) and extrapolation is complex and rarely used, simpler methods can deliver the reduction to a single sampling rate, namely:
• increasing the sampling rate by filling the current range in with the last received value with subsequent sampling;
• increasing the sampling rate by filling in the average or median value at the last range with subsequent sampling;
• decreasing the sampling rate by selecting extrema, mean or median values in the range.
Feature selection can be generally understood as declining in the number of features, for example, by searching for a subspace of a lower dimension using dimensionality reduction methods or by simply discarding a part of uninformative features. Feature selection simplifies models, reduces the complexity of the models problem training, and helps avoid the curse of dimensionality.
• complete rummage of all the feature sets;
• sequential feature selection of features (Add);
• sequential feature elimination (Del);
• genetic algorithm;
• random search;
• clustering of features.
Well-known extensions of some of these algorithms like SHAP (Lipovetsky et al. 2001) and LIME (
Regularization, which imposes a penalty the complexity of the model, is often applied to machine learning problems (
Feature generation is possible if based on the logic and physics of the process or on standard transformations, i.e. raising to the polynomial power or performing multiplication on feature values. Engineering of new diagnostic features is also the acquisition of signal auto-features by using a sliding buffer and all kinds of correlating pairs, and other rather trivial transformations. In respect to NPPs, they are discussed in the monographs by
Most techniques of dimensionality reduction solve both the problem of reducing the number of features and the problem of engineering new diagnostic features. The techniques of dimensionality reduction project data into a lower-dimensional space and, unlike selection methods, considers all the original information, thus making it possible to simplify and improve the procedure for monitoring and searching for anomalies in signals. The dimensionality reduction problem has many applications (
Principle Component Analysis (
Independent Component Analysis (
Both
Compared to
The application of the
The wide applicability of these techniques is explained by the fact that they can tame multidimensional, noisy data with correlated parameters by translating the data into a lower-dimensional space that contains most of the Cumulative Percentage Variance of the original data (
• kernel methods: for PCA, see Lee et al. (2004a) and Choi and Lee (2004); for ICA, see Zhang and Qin (2007); for PLS, see Zhang et al. (2010), Zhang and Hu (2011), Jiao et al. (2017). Unlike the linear methods of dimensionality reduction, the non-linear ones produce an effective dimensionality reduction due to the creation of a non-linear combination of features to create a new lower-dimensional space;
• dynamic methods: for PCA, see Ku et al. 1995, Russell et al. (2000); for ICA, see Lee et al. (2004b); for PLS, see Chen and Liu (2002). The dynamic methods, used for analysis of transient phenomena, supplement the studied sample with a certain number of previous observations and factor in autocorrelations and cross-correlations with displacements in time;
• probabilistic methods: for PCA, see Tipping and Bishop (1999), Kim and Lee (2003); for ICA, see Zhu et al. (2017); for PLS, see Li et al. (2011). The probabilistic methods model the data distribution as a multivariate Gaussian distribution. With PPCA, it is possible to construct a PPCA mixture model, which consists of several local PPCAs and detects faults in data with multimodal or complex non-Gaussian distributions (Ge and Song 2010, Raveendran and Huang 2016, Raveendran and Huang 2017);
• Sparse Principal Component Method (Sparse PCA), which has appeared only recently, takes only a part of the original features to construct a new lower-dimensional space. Gajjar et al. (2018) presented its application for fault detection;
• dynamic kernel PLS technique and a brief overview of works on PLS modifications were presented by Jia and Zhang (2016).
Linear Discriminant Analysis (
Canonical Correlation Analysis (
Factor Analysis is a multivariate statistical analysis that serves to determine the relationship between variables and reduce their number (
Feature bagging, or bootstrap aggregation, is a learning method that searches through randomly selected feature subsamples from
Bagging in combination with basic algorithms turns the problem solution into an ensemble of algorithms, increasing the computational complexity of the basic algorithms but improving the accuracy and robustness of the results. If all features are independent and important, bagging often degrades the quality of responses as each algorithm has an insufficiently informative subsample to learn.
Neural networks are also used for data processing and dimensionality reduction. Today, one of the most effective methods for the latter purpose is an autoencoder – a type of artificial neural network applied to encode data, usually in unsupervised learning (
In addition to feed-forward networks, there are a large number of modernized architectures; some of them are as follows:
• convolutional autoencoders whose architecture includes a convolutional layer that creates a convolutional kernel for the convolution of input data by one feature. It is used for data noise removal (Grais and Plumbley 2017), clustering (Chen 2015, Ghasedi et al. 2017), fault detection (Chen et al. 2018a) and other purposes;
• Recurrent Neural Network (RNN) based Autoencoders and their varieties (Elman 1990, Chung et al. 2016), such as Long Short-Term Memory (Hochreiter and Schmidhuber 1997) and Gated Recurrent Units (Chung et al. 2014);
• Variational Autoencoders (VAE), by studying the probability distributions that simulate the input data, allow the hidden-variables model to learn (Everett 2013). For more details on VAE architecture and applications, refer to Kingma and Welling (2013), Doersch (2016).
Autoencoders can be used jointly with standard fault detection methods, for example, with statistical detection criteria (
Spectral Analysis includes time series processing associated with obtaining a representation of signals in the frequency domain. The main application of Spectral Analysis is to assess the vibration of equipment. The most popular techniques of spectral processing are the Fourier transform, the Laplace transform, the Hilbert transform and the Hilbert-Huang transform. The results of Spectral Analysis are rather easy to interpret, and it is possible to detect faults, determine the nature of their occurrence and make a diagnosis on their basis.
Another tool of fault detection can be to generate diagnostic features that serve as equipment health indicators. Such diagnostic features that characterize the system condition, are identified by an expert based on their experience for a clear and effective understanding of the state of a technical system and, accordingly, for detecting anomalies in operation (
The advantages of the diagnostic features approach include the possibility of creating a rational solution that accumulates experts’ experience, and the ease of health indicator implementation. The disadvantages are the lack of physical or mathematical models that could form the foundation of the method, and its limitations for, as a rule, an indicator points only to malfunctions of the same kind in one unit of equipment.
The problem of lacking time series data leads to the inapplicability of deep learning algorithms in some applications. In such cases, augmentation or data generation is used for adding more synthetic data for better training and working of machine learning algorithms. Though quite a bit of attention is paid to this field of knowledge, the surveys by Ivana et al. (2020) and Wen et al. (2021) highlight the state of this research field. The latter work provides the following taxonomy for time series data augmentation:
1. Basic approaches:
a. Time domain;
b. Frequency domain;
c. Time-frequency domain.
2. Advanced approaches:
a. Decomposition Methods;
b. Statistical Generative Models;
c. Learning Methods (including Embedding Space, Deep Generative Models, and Automated Data Augmentation).
Although data augmentation is quite a useful tool for improving the quality of various models, it mainly relates to the training stage. Data augmentation almost never is being a part of the equipment diagnostics pipeline. Moreover, time series data augmentation methods are not appropriately researched for real-world industrial data with noise and possible various statistical changes happening all the time.
Each pre-processing method has its own distinctive nature in relation to the original data: some are capable of working with one data object while others require the calculation of values based on a learning sample or a buffer. Moreover, real-time pre-processing must match the diagnostics model selected for learning; otherwise, the models may give incorrect results. For such cases, it is worth discussing the mechanisms for applying pre-processing methods:
• The pointwise transformation in learning and operation. This mechanism is used when the applied pre-processing methods require a state vector only at the current time. Examples of such transformations are deleting data exceeding a certain (for example, physically justified) threshold, raising a feature to the polynomial power, performing multiplication on feature values, etc.
• Complete or batch transformation during learning, pointwise transformation during operation. This mechanism is used when the transformation requires the calculation of values, for example, the mean or the variance of a learning sample. The values obtained at the learning stage are saved and applied in real-time operation for each new state vector. Examples of such transformations are One-Class SVM, iForest, MCD, PCA and all linear methods for reducing features to a single scale mentioned in this article.
• Batch transformation. It refers to the transformation of features based on the calculation of characteristics using a sliding window or a batch. An example here is calculating a moving average of a signal per a window or obtaining auto-characteristics of signals using a sliding buffer and all kinds of correlated pairs.
Let us demonstrate how methods are applied in real-time mode, assuming that our preprocessing pipeline consists of the following steps:
1. Moving average for gaps filling;
2. Z-Normalization;
3. PCA applying;
1. Selecting the first principal component for further comparison with the threshold for anomaly detection.
First of all, the new point for multivariate time series is received. Then the average value for the window with previous points is calculated if some of the values in the novel vector are missing. Into the gaps, calculated points are inserted. After that, Z-normalization is applied using previously (during the training stage, commonly, for fault-free mode) defined mean and standard deviation values. Afterward,
This overview has described the peculiarities of the data collected at NPPs and its pre-processing in real time. Table
Characteristics of data pre-processing methods
Item | Method | Data input limitation | Problem type | Univariate/Multivariate | Online | References |
---|---|---|---|---|---|---|
Data Cleansing | ||||||
1 | One-class |
Normalization | Unsupervised | +/+ | + |
|
|
||||||
|
||||||
|
||||||
2 | iForest | Normalization | Unsupervised | +/+ | –* |
|
|
||||||
|
||||||
3 | Cluster analysis | Equidistant* | Unsupervised | +/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
4 |
|
Normalization, equidistant | Unsupervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
Leys et al. 2008 | ||||||
|
||||||
Feature Selection and Generation | ||||||
5 |
|
Normalization, equidistant | supervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
6 |
|
Normalization, equidistant | supervised | –/+ | + |
|
|
||||||
|
||||||
7 |
|
Normalization, equidistant | Unsupervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
8 | Normalization, equidistant | Unsupervised | –/+ | + |
|
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
9 | Normalization, equidistant | Unsupervised | –/+ | + |
|
|
|
||||||
|
||||||
|
||||||
10 | Factor analysis | Normalization, equidistant | supervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
11 | Spectral analysis | Stationarity, equidistant* | Unsupervised | +/– | + | |
|
||||||
12 | Bagging | – | Unsupervised | –/+ | – |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
13 | Autoencoder | Normalization | supervised | +/+ | +* |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
14 | Health indicators | –* | Unsupervised* | +/+ | + |
|
|
||||||
|
||||||
|
The problems encountered in data are not unique to the nuclear industry, but the outstanding aspect of NPPs is the large amount of generated information, the variety of its sources and data types. Pre-processing is necessary to prepare the data for input to the diagnostic algorithms, since many of them either have requirements that rule out the input of data with gaps, outliers, signals with different sampling rates, or produce incorrect results when working with unscaled data. Another reason for using pre-processing methods is the possibility of improving the quality of the diagnostic algorithms and reducing the computational complexity of the problem, for example, by reducing the dimensionality of the initial data or lowering the sampling frequency of signals.
We find it necessary to give a summary with providing our opinion on which methods are commonly used, which are not, and why:
• When filling in gaps, the most intuitive way is to use specially assigned values to avoid generating false information about the data. But not all machine learning methods can process such values properly. That is why the most common techniques fill the gaps with some data characteristics from moving windows or over the whole signal realization. Machine learning techniques are quite rare and situational for such problems.
• As for outliers and impossible values detection, the most straightforward approaches to detecting values that contradict the laws of physics are the most popular ones due to the transparency of such rules for engineering personnel. Searching for deviation from some statistical characteristics, even utilizing machine learning techniques, is still fighting for attention. They are primarily used in retrospective analysis or in diagnostic systems that provide recommendations for operating personnel but not in critical safety systems.
• When transforming the data, Z-Normalization and Min-Max scaling are the most common scaling techniques because in the overwhelming majority of cases they show better results. Moreover, other methods are used when they are required for some specific reason for further analysis. Box-Cox transformation and other techniques like derivating the data are situational and used when further research requires working with normally distributed data or stationary time-series.
• A lack of sample rate for the signal or various sample rates is a frequent problem for industrial data. When selecting a unified sample rate, achieving a trade-off between the loss of information and computational complexity is vital. At the same time, the choice of a specific rate should be based on the characteristic rate of the analyzed process. When increasing the sample rate in the real-time mode, filling the current range with the last received value is the most common technique. When decreasing, both extrema and mean/median values are commonly used.
• For feature selection, a thorough analysis combining with various mentioned algorithms works the best. Analysis may also include finding dependencies of target vector from features when the problem is supervised. One of the most common ways is fitting some simple model, calculating feature importance for this model, and then selecting the most important features for fitting a more complex model. Regularisation is also commonly used when applicable. Among dimensionality reduction techniques, PCA is the most popular since it is unsupervised and provides linear transformation easy-to-understand and transparent for personnel. Although nonlinear techniques, including neural networks, show state-of-the-art results, they lack interpretability of how transformation is constructed, making the approaches not popular in industrial applications.
• Feature generation in real-world applications is primarily based on the logic and physics of the process resulting in heuristical health indicators and various meaningful characteristics from spectral analysis.
The methods described in this work have already successfully proven themselves in industrial application, including at NPPs. At the same time, these methods continue to develop, and there appear supplements that improve their operation or expand their field of application. This overview, together with
Further research can be focused on overviewing the methods used to solve such diagnostic problems at NPPs as arriving at the correct diagnosis, fault localization, and prognosis of the malfunction development.