NoisyEQA: benchmarking Embodied Question Answering with imperfect queries from non-expert users

Tao Wu; Chuhao Zhou; Haozhi Cao; Yen Heng Wong; Lin Gu; Jianfei Yang

doi:10.55092/rl20260014

Abstract

Embodied Question Answering (EQA) enables robots to explore the environment and answer human questions, which is important for human-robot interaction and has been significantly enhanced by the recent advancement of Vision-Language Models (VLMs). However, EQA in real-world scenarios remains challenging, as human-posed questions often contain noise that can interfere with an embodied agent’s reasoning, bringing challenges for language beginners and non-expert users. To address this, we introduce a NoisyEQA benchmark designed to evaluate the ability of the robot to identify and correct noisy questions. NoisyEQA is inspired by three common types of noise observed in real-world applications: Memory Noise, Perception Noise, and Semantic Noise, generated through an automated dataset creation framework. Additionally, we propose a ‘Self-Correction’ prompting mechanism to enhance EQA against noise robustness and a novel evaluation metric to measure both noise detection capability and answer quality. Our comprehensive study reveals that current embodied agents often struggle to detect noise in questions, leading to responses that frequently contain incorrect information. Through our self-correct prompting mechanism, we can effectively improve the accuracy of agent answers.

Keywords

Embodied Question Answering; navigation; embodied LLM; active agents

Preview

view pdf