Corresponding author: Iurii D. Katser ( iurii.katser@skoltech.ru ) Academic editor: Georgy Tikhomirov
© 2021 Iurii D. Katser, Vyacheslav O. Kozitsin, Ivan V. Maksimov, Denis A. Larionov, Konstantin I. Kotsoev.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Katser ID, Kozitsin VO, Maksimov IV, Larionov DA, Kotsoev KI (2021) Data pre-processing methods for NPP equipment diagnostics algorithms: an overview. Nuclear Energy and Technology 7(2): 111-125. https://doi.org/10.3897/nucet.7.63675
|
The main tasks of diagnostics at nuclear power plants are detection, localization, diagnosis, and prognosis of the development of malfunctions. Analytical algorithms of varying degrees of complexity are used to solve these tasks. Many of these algorithms require pre-processed input data for high-quality and efficient operation. The pre-processing stage can help to reduce the volume of the analyzed data, generate additional informative diagnostic features, find complex dependencies and hidden patterns, discard uninformative source signals and remove noise. Finally, it can produce an improvement in detection, localization and prognosis quality. This overview briefly describes the data collected at nuclear power plants and provides methods for their preliminary processing. The pre-processing techniques are systematized according to the tasks performed. Their advantages and disadvantages are presented and the requirements for the initial raw data are considered. The references include both fundamental scientific works and applied industrial research on the methods applied. The paper also indicates the mechanisms for applying the methods of signal pre-processing in real-time. The overview of the data pre-processing methods in application to nuclear power plants is obtained, their classification and characteristics are given, and the comparative analysis of the methods is presented.
advanced analytics, data analysis, data pre-processing, diagnostics, NPP, machine learning, raw data
Modern nuclear power plants (NPP) generate large amounts of data. The methods of intellectual analysis make it possible to apply the generated data for the purpose of detecting malfunctions, determining the operating lifetime of equipment and solving other urgent problems in NPP operation.
Such data contain valuable information about incipient faults, but it can be extremely difficult to use the so-called raw or unprocessed data in analytical algorithms. The algorithms of fault detection, pattern recognition, fault localization, prognosis of fault development, etc. require signal pre-processing for high-quality output. The pre-processing techniques include both machine learning methods (
The pre-processing stage is very important in detection algorithms. Its relevance seems rather evident since it is an integral part to the overwhelming majority of the methods mentioned in this overview and other reviews of data processing methods (
Fig.
The main path of the equipment diagnostics is the sequential execution of all stages, starting with data acquisition, followed by pre-processing, fault detection, localization, diagnosis or root cause identification and prognosis of how the detected faults may develop. The dashed line indicates an auxiliary path of equipment diagnostics, in which the stages do not follow from one another. The auxiliary path can be taken either in deferred analysis when any stage is considered separately from the others; or when using the original data in its unprocessed form or adding new data at any stage; or in other pre-processing methods to prepare the original data and thus ensure algorithm operation.
It is necessary here to clarify some of the terms used in this article. The offline mode will refer to working with the full data sample; in this case full realization of the signals is available for analysis. The online mode will mean working in real time; in this case, the full data sample is unavailable for analysis, data objects (vectors) can arrive one after another as streaming data – hence, the analysis is called the pointwise analysis – or there can be a buffer with batch data – hence the analysis is called the batch analysis.
learning refers to tasks in which all the operating modes of equipment are known and the data classes are marked; in other words, the data on both the normal mode of operation and the abnormal mode of operation (preferably also on all types of abnormalities) are available. Semi-supervised learning refers to tasks in which only the data on normal mode of operation is available; this means that only the part of data describing normal operation of equipment has a class mark. Unsupervised learning refers to tasks in which there is no data on either normal or abnormal operation and no class marks for any data.
This article focuses on the Data and Pre-Processing stages, traced with heavy line in Fig.
An NPP may have tens of thousands of instrument channels (
Most of the generated and aggregated signals relate to the raw data and represent time-series type of data. Asynchronous generation and acquisition of data present a problem in data analysis. Malfunctions of measurement channels result in data omissions, inaccurate readings and noise contamination. Moreover, self-monitoring or self-diagnostic systems of measuring equipment can either detect invalid values or skip them. However, various pre-processing methods make it possible to minimize the impact of such factors on the quality of technical diagnostics.
In general, the Pre-Processing stage consists of the four main steps shown in Fig.
The Data Cleansing helps eliminate invalid values and outliers by removing or correcting them. At this stage, either the missing data are filled in, or the data objects containing such gaps are deleted if their share is small. The features with a large number of data gaps or invalid values can also be excluded from further analysis.
All measurements affecting NPP safety should be promptly diagnosed and marked by a validity indicator (
Data gaps appear due to the imperfection of modern measuring systems, communication channels and other infrastructure. This poses a problem when working with anomaly detection methods and other techniques. The simplest approaches here are to ignore features with gaps or replace the gaps with specially assigned values, for example, 0 or −1. Also, missing values can be filled in by standard methods, such as the moving average or median over the selected window; the average (quantitative characteristic), mode (categorical characteristic) or median value over the entire time series; and the last value obtained before the gap. Alternatively, there are advanced methods to fill in missing data, for example, the machine-learning methods (for regression, see
To tackle the problem of outliers, one can either apply conventional methods, for example, remove values that contradict the laws of physics or fail to meet the standard deviation of a feature, or resort to modern methods of data mining and machine learning. However, in most cases, the problem of finding anomalies in data is an unsupervised learning task and hence it is suggested to use the class of unsupervised learning methods. In his textbook on models for detecting outliers and anomalies,
Another approach to solving the problem of outlier detection is the use of ensembles (
Turning now to support vector machines (SVM), there are two principal SVM-based methods for detecting anomalies in data (
Isolation Forest, or iForest, identifies outliers by the low depth of outlying values in the constructed tree (
Cluster analysis is the process of categorizing a set of objects into groups (clusters) so that objects in one group are similar by some of the attributes. The study by
Let us now consider minimum covariance determinant (MCD), another method to control outliers in data (
At the Feature Transformation stage, the transformation affects the features values (scaling, change in the sampling rate), their type (categorization of discrete and continuous values), modality (videos are converted into a sequence of pictures, pictures into tables of numerical data), etc.
Most of the pre-processing algorithms require input data, the features of which are on the same scale, since the mean value and variance of features impact their significance for algorithms (
In addition to scaling, the Box-Cox transformation (taking of logarithm) is often applied to features (
Another important problem is to bring signals with different sampling rates to a single one. In their monograph,
The choice of a specific rate, which all signals must be converted to, should be based on the characteristic rate of the analyzed process and be consistent with the subsequent stages of diagnostics. A significant decrease in the rate can lead to the loss of information in the signals while an unreasonable increase in the rate can affect the computational complexity of subsequent data analysis processes.
Firstly, now that the machine learning methods are gaining popularity, including due to the ability to work with Big Data, sometimes it pays to bring signals to a low frequency to reduce the total computational complexity of the problem. It also may be necessary to reduce the sampling rate if the set of sequentially applied methods is large, to be able to solve problems in real time.
Secondly, the monograph missed an important point of applying the above approaches in the real time mode. Since interpolation is not applicable in real time mode (in the pointwise analysis) and extrapolation is complex and rarely used, simpler methods can deliver the reduction to a single sampling rate, namely:
Feature selection can be generally understood as declining in the number of features, for example, by searching for a subspace of a lower dimension using dimensionality reduction methods or by simply discarding a part of uninformative features. Feature selection simplifies models, reduces the complexity of the models problem training, and helps avoid the curse of dimensionality.
Well-known extensions of some of these algorithms like SHAP (Lipovetsky et al. 2001) and LIME (
Regularization, which imposes a penalty the complexity of the model, is often applied to machine learning problems (
Feature generation is possible if based on the logic and physics of the process or on standard transformations, i.e. raising to the polynomial power or performing multiplication on feature values. Engineering of new diagnostic features is also the acquisition of signal auto-features by using a sliding buffer and all kinds of correlating pairs, and other rather trivial transformations. In respect to NPPs, they are discussed in the monographs by
Most techniques of dimensionality reduction solve both the problem of reducing the number of features and the problem of engineering new diagnostic features. The techniques of dimensionality reduction project data into a lower-dimensional space and, unlike selection methods, considers all the original information, thus making it possible to simplify and improve the procedure for monitoring and searching for anomalies in signals. The dimensionality reduction problem has many applications (
Principle Component Analysis (PCA) is a widely used technique for reducing the dimensionality of datasets. The idea of the method is to search for a hyperplane of a given dimensionality in the original space with the subsequent projection of the data onto the found hyperplane. The axes of the new space are a linear combination of the original ones and get selected based on the variance of the original features. The transformation of the measurement space into a new orthogonal space is performed by bringing the covariance (correlation) matrix to a diagonal form; for this reason, the original features in the new space are uncorrelated.
Independent Component Analysis (ICA), unlike Principle Component Analysis, finds a space in which the original features are not only uncorrelated, but also independent in terms of statistical moments of a higher order. In other words, Independent Component Analysis solves the problem of finding any, including non-orthogonal, space where the axes are a linear combination of the original ones. The goal is to transform the original signals so that in the new space they would be statistically independent from each other as much as possible (
Both PCA and ICA build transformations into a new space only based on the matrix of features, without taking into account the response vector. This solves the problem of the mutual dependence of features, but fails to tackle the presence of features that do not affect the target variable (response vector). That is why such features are used in further analysis.
Compared to PCA where the axes of the new space are selected based on the variance of the original features, the Partial Least Squares (PLS) method, or Projection to Latent Structures, selects the axes of the new space proceeding from the maximization of the covariance between the matrix of features and the matrix of responses. At that, new spaces are found for both matrices. The new axes for the feature space are calculated to provide the maximum variance along the axes in the new space for the matrix of responses. Using the data on equipment faults as responses, one can obtain a lower-dimensional space for the matrix of feature and hence more accurately determine various faults (
The application of the PLS method is limited due to the need to know the classes of events (faults) when training the model. For that reason, the method is often used at the pre-processing stage when solving the problem of making a diagnosis or determining the causes.
The wide applicability of these techniques is explained by the fact that they can tame multidimensional, noisy data with correlated parameters by translating the data into a lower-dimensional space that contains most of the Cumulative Percentage Variance of the original data (
Linear Discriminant Analysis (LDA), or Fisher Discriminant Analysis (FDA), is a statistical analysis method that searches for a linear combination of features able to separate events from different classes (determining different faults) in the best way possible (
Canonical Correlation Analysis (CCA), or Canonical Variate Analysis (CVA) is a technique of searching for lower-dimensional spaces for two sets of variables (features and responses) when projecting the data in which the cross-correlations between the two sets of variables are maximal among all possible variants of spaces (
Factor Analysis is a multivariate statistical analysis that serves to determine the relationship between variables and reduce their number (
Feature bagging, or bootstrap aggregation, is a learning method that searches through randomly selected feature subsamples from n/2 to n − 1 from the number of original n features and uses the basic algorithm on each subsample, and after that all results are aggregated by summation or another method (
Bagging in combination with basic algorithms turns the problem solution into an ensemble of algorithms, increasing the computational complexity of the basic algorithms but improving the accuracy and robustness of the results. If all features are independent and important, bagging often degrades the quality of responses as each algorithm has an insufficiently informative subsample to learn.
Neural networks are also used for data processing and dimensionality reduction. Today, one of the most effective methods for the latter purpose is an autoencoder – a type of artificial neural network applied to encode data, usually in unsupervised learning (
In addition to feed-forward networks, there are a large number of modernized architectures; some of them are as follows:
Autoencoders can be used jointly with standard fault detection methods, for example, with statistical detection criteria (
Spectral Analysis includes time series processing associated with obtaining a representation of signals in the frequency domain. The main application of Spectral Analysis is to assess the vibration of equipment. The most popular techniques of spectral processing are the Fourier transform, the Laplace transform, the Hilbert transform and the Hilbert-Huang transform. The results of Spectral Analysis are rather easy to interpret, and it is possible to detect faults, determine the nature of their occurrence and make a diagnosis on their basis.
Another tool of fault detection can be to generate diagnostic features that serve as equipment health indicators. Such diagnostic features that characterize the system condition, are identified by an expert based on their experience for a clear and effective understanding of the state of a technical system and, accordingly, for detecting anomalies in operation (
The advantages of the diagnostic features approach include the possibility of creating a rational solution that accumulates experts’ experience, and the ease of health indicator implementation. The disadvantages are the lack of physical or mathematical models that could form the foundation of the method, and its limitations for, as a rule, an indicator points only to malfunctions of the same kind in one unit of equipment.
The problem of lacking time series data leads to the inapplicability of deep learning algorithms in some applications. In such cases, augmentation or data generation is used for adding more synthetic data for better training and working of machine learning algorithms. Though quite a bit of attention is paid to this field of knowledge, the surveys by Ivana et al. (2020) and Wen et al. (2021) highlight the state of this research field. The latter work provides the following taxonomy for time series data augmentation:
Although data augmentation is quite a useful tool for improving the quality of various models, it mainly relates to the training stage. Data augmentation almost never is being a part of the equipment diagnostics pipeline. Moreover, time series data augmentation methods are not appropriately researched for real-world industrial data with noise and possible various statistical changes happening all the time.
Each pre-processing method has its own distinctive nature in relation to the original data: some are capable of working with one data object while others require the calculation of values based on a learning sample or a buffer. Moreover, real-time pre-processing must match the diagnostics model selected for learning; otherwise, the models may give incorrect results. For such cases, it is worth discussing the mechanisms for applying pre-processing methods:
Let us demonstrate how methods are applied in real-time mode, assuming that our preprocessing pipeline consists of the following steps:
First of all, the new point for multivariate time series is received. Then the average value for the window with previous points is calculated if some of the values in the novel vector are missing. Into the gaps, calculated points are inserted. After that, Z-normalization is applied using previously (during the training stage, commonly, for fault-free mode) defined mean and standard deviation values. Afterward, PCA is applied using a transformation matrix calculated for the train set. Finally, the value over the first principal axis is selected for further comparison.
This overview has described the peculiarities of the data collected at NPPs and its pre-processing in real time. Table
Item | Method | Data input limitation | Problem type | Univariate/Multivariate | Online | References |
---|---|---|---|---|---|---|
Data Cleansing | ||||||
1 | One-class SVM | Normalization | Unsupervised | +/+ | + |
|
|
||||||
|
||||||
|
||||||
2 | iForest | Normalization | Unsupervised | +/+ | –* |
|
|
||||||
|
||||||
3 | Cluster analysis | Equidistant* | Unsupervised | +/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
4 | MCD | Normalization, equidistant | Unsupervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
Leys et al. 2008 | ||||||
|
||||||
Feature Selection and Generation | ||||||
5 | PCA | Normalization, equidistant | supervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
6 | ICA | Normalization, equidistant | supervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
7 | PLS | Normalization, equidistant | Unsupervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
8 | LDA, FDA | Normalization, equidistant | Unsupervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
9 | CCA, CVA | Normalization, equidistant | Unsupervised | –/+ | + |
|
|
||||||
|
||||||
|
||||||
|
||||||
10 | Factor analysis | Normalization, equidistant | supervised | –/+ | + |
|
|
||||||
|
||||||
11 | Spectral analysis | Stationarity, equidistant* | Unsupervised | +/– | + |
|
|
||||||
12 | Bagging | – | Unsupervised | –/+ | – |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
13 | Autoencoder | Normalization | supervised | +/+ | +* |
|
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
|
||||||
14 | Health indicators | –* | Unsupervised* | +/+ | + |
|
|
||||||
|
||||||
|
The problems encountered in data are not unique to the nuclear industry, but the outstanding aspect of NPPs is the large amount of generated information, the variety of its sources and data types. Pre-processing is necessary to prepare the data for input to the diagnostic algorithms, since many of them either have requirements that rule out the input of data with gaps, outliers, signals with different sampling rates, or produce incorrect results when working with unscaled data. Another reason for using pre-processing methods is the possibility of improving the quality of the diagnostic algorithms and reducing the computational complexity of the problem, for example, by reducing the dimensionality of the initial data or lowering the sampling frequency of signals.
We find it necessary to give a summary with providing our opinion on which methods are commonly used, which are not, and why:
The methods described in this work have already successfully proven themselves in industrial application, including at NPPs. At the same time, these methods continue to develop, and there appear supplements that improve their operation or expand their field of application. This overview, together with
Further research can be focused on overviewing the methods used to solve such diagnostic problems at NPPs as arriving at the correct diagnosis, fault localization, and prognosis of the malfunction development.