Improving sample efficiency in visual reinforcement learning under data augmentation
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China
  • Volume
  • Citation
    Wang Z, Jiang W, Kou Q, Peng R, Lan X. Improving sample efficiency in visual reinforcement learning under data augmentation. Robot Learn. 2026(2):0012, https://doi.org/10.55092/rl20260012. 
  • DOI
    10.55092/rl20260012
  • Copyright
    Copyright2026 by the authors. Published by ELSP.
Abstract

Visual reinforcement learning (RL) relies on high-dimensional visual inputs but often suffers from low sample efficiency. While data augmentation enhances input diversity, it can inadvertently hinder policy learning by increasing visual complexity and reducing consistency. In this paper, we systematically analyze the efficiency bottlenecks associated with data augmentation and identify two key challenges: increased representational complexity and uncertainty in value estimation. To tackle these issues, we propose a novel framework, Feature Consistency and Self-value Distillation (FCSD). FCSD enforces feature consistency across augmented views and stabilizes value learning by distilling from its own past value network. We evaluate FCSD on several challenging RL benchmarks, including DMControl, ProcGen and Atari. Experimental results suggest that FCSD improves sample efficiency by mitigating augmentation-induced instability. Compared to several representative augmentation-based methods, FCSD demonstrates competitive or superior performance. Moreover, FCSD integrates seamlessly with both on-policy and off-policy RL algorithms, and demonstrates consistently robust and generalizable performance across diverse tasks and domains.

Keywords

visual reinforcement learning; sample efficiency; data augmentation

Preview