Using Machine Learning to Identify Gross Errors in Measurement Data

Based on a successful Knowledge Transfer Partnership (KTP) project between the National Subsea Centre and Accord ESL, the Net Zero Operations team developed and implemented an innovative data reconciliation and non-random errors (gross error) identification approach, using Machine Learning (ML), specifically for hydrocarbon allocation. The KTP began in 2019 intending to develop a software product that could help clients identify measurement errors and/or value leakage in hydrocarbon processing and prevent subsequent errors in allocation.

In this article, our Net Zero Operations Senior Research Fellow, Dr. Thanh Nguyen introduces how Machine Learning Technology can help identify gross errors in measurement data.

Gross Error Detection and Identification

In the chemical industry, highly accurate and reliable measurements play an important role in monitoring process conditions, implementing control measures, and optimizing operations. Efficiency analysis and improved measurement accuracy not only contribute to more lucrative operations, but also aid in the detection of operational faults. Unfortunately, the inherent nature of measurements introduces errors stemming from various sources such as power supply fluctuations, network transmission and signal conversion noise and analogue input filtering. This category of error is commonly referred to as random measurement error.

To mitigate random errors, data reconciliation is employed, aligning measurements with process constraints such as mass or energy balance. Established data reconciliation techniques use mathematical methods, such as least-squares, to adjust measurements utilising process model equations such as equilibrium equations and conservation laws. Those data points, that require to be adjusted more than an expected amount, are flagged as potential errors for further investigation. It is widely acknowledged that reconciliation techniques operate under the assumption that only random errors are present in the data. The existence of non-random errors, known as gross errors (GEs), caused by factors such as instrument failure, measurement bias or process leaks, can significantly compromise the accuracy and feasibility of the reconciled results. Therefore, detecting GEs is a crucial preliminary step before arriving at final reconciled estimates.

Machine Learning Methods for Gross Error Identification

In recent years, numerous methods for gross error detection (GED) have been introduced, primarily relying on statistical tests. Despite their widespread uses in industries, two shortcomings should be noted. Firstly, these tests operate on process data corrected with the assistance of steady-state material and balance models. Existing literature assumes that GED and data reconciliation (DR) models accurately capture the process without mathematical errors. In practice, however, models can be inaccurate, leading to uncertainties that violate this assumption. Furthermore, statistical GED tests focus on a single snapshot in time without looking at patterns in data that could provide insights into the performance of a system or a specific meter over time.

In collaboration with experts of Accord ESL, Thanh and the Net Zero Operations team explored the potential of a data-driven approach using Machine Learning (ML) to address the GED problem. ML refers to algorithms that learn from data to perform human-level tasks, and nowadays ML has been applied to many areas, outperforming humans in many tasks and driving advancements in artificial intelligence (AI) applications. The research work undertaken focused on (i) raising awareness of the application of ML/DL to the GED problem in the chemical engineering community and (ii) showcasing how ML approaches can be utilised to identify measurement errors and/or value leakage in hydrocarbon processing.

The Proposed Approach

To apply ML methods to the problem of GED, training data is required to exploit the pattern of GEs. According to the knowledge gained throughout this project, there is a lack of training datasets for the problem of GED in the literature which prevents the application of ML methods to this problem. The team generated a number of measurement datasets, including training and testing data, associated with 16 systems introduced in the literature.
A learning system to detect and identify GEs by using ML methods was proposed. The model describes a data pipeline consisting of multiple sequential steps, from data pre-processing to detection model training and deployment. This study aims to introduce a pipeline on how to implement and deploy an ML-based framework for the GED problem, starting from the raw measurement data to the trained detection model.
Both supervised and unsupervised ML for the problem of GE detection and identification including classification, regression and anomaly detection were investigated. Thus, the methods can be widely applied to several types of systems, not limited to hydrocarbon processes.
The team proposed to use ensemble learning techniques, a subfield of ML, to combine multiple GE-based identifiers in which each identifier is associated with a combining weight. The combining weights were found by minimising the identification errors using optimisation methods.
The experimental results show that the proposed method accurately identifies gross errors in benchmark systems and performs better than traditional statistical test-based methods. The method can be integrated into existing systems and uses established data reconciliation techniques and ML to determine the measurement quality of complex systems.

This research project led to the journal paper ‘A Comparative study of anomaly detection methods for gross error detection problems’ and several papers, listed below, which have been presented at international conferences Global Flow Measurement Workshop and GECCO.

Technical paper ‘Machine Learning based gross error estimation for allocation systems’
Conference paper ‘Weighted ensemble of gross error detection methods based on paticle swarm optimization’
Conference paper ‘Evolved ensemble of detectors for gross error detection’

To discover more about how our Net Zero Operations team are solving real-world problems view our dedicated Net Zero Operations webpage or Projects page.

Impact: Using Machine Learning Methods to Identify Gross Errors in Measurement Data

Impact: Using Machine Learning Methods to Identify Gross Errors in Measurement Data