Visual reinforcement learning (RL) relies on high-dimensional visual inputs but often suffers from low sample efficiency. While data augmentation enhances input diversity, it can inadvertently hinder policy learning by increasing visual complexity and reducing consistency. In this paper, we systematically analyze the efficiency bottlenecks associated with data augmentation and identify two key challenges: increased representational complexity and uncertainty in value estimation. To tackle these issues, we propose a novel framework, Feature Consistency and Self-value Distillation (FCSD). FCSD enforces feature consistency across augmented views and stabilizes value learning by distilling from its own past value network. We evaluate FCSD on several challenging RL benchmarks, including DMControl, ProcGen and Atari. Experimental results suggest that FCSD improves sample efficiency by mitigating augmentation-induced instability. Compared to several representative augmentation-based methods, FCSD demonstrates competitive or superior performance. Moreover, FCSD integrates seamlessly with both on-policy and off-policy RL algorithms, and demonstrates consistently robust and generalizable performance across diverse tasks and domains.
visual reinforcement learning; sample efficiency; data augmentation