Underwater Visual Perception: Bridging the Gap Between Machine Understanding and Real-World Autonomy

Tuesday 15th April 2025

Building on the development of the SeaSense project, the Cyber-Physical Systems team is now focused on advancing underwater visual perception to enable reliable autonomy in complex subsea environments. This ongoing research addresses the fundamental challenges faced by machine vision systems underwater, where distortion, turbidity and inconsistent lighting conditions severely impact perception.

The current work explores how core technologies such as depth estimation and Visual Simultaneous Localization and Mapping (VSLAM) are influenced by underwater image degradation, and whether enhancements designed for human observers actually support or hinder machine interpretation. A key part of this effort involves the creation of structured datasets under controlled conditions to evaluate how machines “see” underwater and how that differs from human perception.

In this article, NSC Research Fellow, Dr Ali Rohan, provides an overview of the key technologies behind underwater perception, the necessity of ground truth data for depth and localisation and the broader industrial impact of this research as it drives forward the future of subsea autonomy.

Depth Estimation and Visual Simultaneous Localisation and Mapping (VSLAM)

As the drive for underwater autonomy grows, two core technologies have become essential for enabling intelligent decision-making: depth estimation and VSLAM. These systems form the foundation of how an autonomous robot perceives and understands its environment in the absence of GPS and under challenging visual conditions.

Depth estimation allows machines to calculate the distance between the camera and objects in the scene, creating a 3D representation of the surroundings. This is critical for tasks such as obstacle avoidance, navigation, object interaction and 3D reconstruction. In underwater environments, where visibility is poor and sensor readings can be unreliable, depth estimation becomes a key enabler of safe operation.

VSLAM, on the other hand, enables a robot to build a map of its environment while estimating its position within it using visual data. By tracking key features across sequential images, VSLAM allows autonomous systems to move and localise themselves in unknown environments without external positioning systems.

Together, depth estimation and VSLAM form the backbone of underwater visual perception. However, both are highly sensitive to the quality of visual input, which is often degraded underwater due to turbidity, lighting variability and colour distortion.

At the National Subsea Centre (NSC), the research focuses on understanding and improving how these systems perform in real underwater settings. By investigating the relationship between visual degradation and algorithm performance, the team aims to strengthen machine perception and move closer to true underwater autonomy.

Background and Necessity

While significant progress has been made in autonomous navigation above ground, applying the same principles underwater remains a major challenge. The underwater environment presents unique and often unpredictable visual conditions—turbidity, inconsistent lighting, low contrast and colour distortion—that significantly degrade camera data. These conditions make it difficult for humans and machines to interpret visual scenes accurately, particularly when relying on vision-based perception systems like depth estimation and VSLAM.

A critical insight from the ongoing research at the National Subsea Centre (NSC) is that what looks enhanced or clear to a human observer may not be helpful or even usable for a machine. Many conventional enhancement techniques are designed with human vision in mind. However, machines rely on low-level visual features—edges, textures and gradients which may be altered or lost when images are enhanced for human readability. This disconnect can lead to poor feature extraction, inaccurate depth predictions and failed localisation.

Compounding this issue is the lack of ground truth data for training and validating AI-based underwater perception models. In most cases, it is extremely difficult to obtain accurate reference measurements for depth and localisation in real subsea conditions. As a result, models are either trained on simulated data or repurposed from terrestrial datasets, both of which fail to fully represent underwater complexity.

To address this, the team are building structured datasets under controlled lighting and variable object conditions, allowing us to systematically study degradation patterns across RGB channels and evaluate their impact on perception algorithms. This forms the foundation for more robust and reliable underwater autonomy.

Benefits to Industry

As industries across the marine sector continue to adopt automation, the reliability of underwater perception systems becomes a key factor in the success of autonomous operations. From offshore energy and subsea inspections to aquaculture, marine research and environmental monitoring, there is a growing demand for underwater systems that can function with minimal human intervention. However, this shift depends on whether machines can perceive and interpret their environment accurately in the challenging conditions beneath the surface.

The research undertaken at the National Subsea Centre (NSC) directly addresses this challenge by focusing on how underwater image degradation impacts machine perception, specifically in-depth estimation and VSLAM. By identifying the limits of current vision-based systems and understanding how image quality affects performance, we are helping to develop perception models that are more adaptive, robust, and reliable in real-world subsea environments.

For industry, this has several clear benefits. Improved perception enables greater autonomy in underwater vehicles, which can reduce the need for costly support vessels and personnel, increase inspection frequency, and expand operational windows in difficult conditions. It also enhances data quality and mission reliability, lowering the risk of failed deployments due to perception failures.

Moreover, the team’s work supports customised enhancement strategies, allowing industry stakeholders to tailor visual processing pipelines depending on whether the end goal is human analysis or machine interpretation. This distinction can improve both real-time decision-making and post-mission data review, offering more value from each deployment and driving forward the next generation of intelligent subsea technologies.

Impact

The long-term impact of our research lies in creating reliable, intelligent underwater systems that can operate autonomously across diverse real-world marine environments. By focusing on the fundamentals of visual perception, we aim to remove one of the major barriers preventing the broader deployment of autonomous underwater vehicles (AUVs): the inability of current systems to consistently interpret degraded visual data.

One of the core outputs of this work is the development of a structured, multi-condition underwater dataset, captured under varying turbidity levels, lighting conditions and distances. What sets this dataset apart is its dual focus: it is not only designed to simulate realistic underwater scenarios but also to include fine-grained annotations and reference data for evaluation. This will enable researchers and industry partners to train, benchmark, and validate AI models in ways that were previously not possible.

In parallel, the team are generating insights into how specific types of visual degradation—such as channel-level distortion, contrast loss or feature suppression—affect machine performance. These findings are helping shape machine-specific enhancement techniques that improve the performance of vision-based systems, as opposed to traditional enhancements tailored for human viewing.

Looking ahead, the project opens opportunities for collaborative development with sensor manufacturers, AUV developers, and offshore service providers who are seeking to improve the perception and autonomy capabilities of their platforms. By bridging the gap between machine vision and real-world underwater environments, our work contributes to a more resilient, cost-effective and intelligent future for subsea robotics.