Robot Learning

ISSN: 2960-1436 (Print)

ISSN: 2960-1444 (Online)

CODEN: RLABAV

About This Journal
Special Issues
View more
Learning Based Robot Path and Task Planning
Special Issue Editor:   Guangliang Li, Shiqi Zhang, Dachuan Li
Submission Deadline:  31 July 2026
Human-in-the-Loop Robot Learning in the Era of Foundation Models: Challenges and Opportunities
Special Issue Editor:   Jianzhuang Zhao, Xing Liu, Marta Lagomarsino, Francesco Tassi, Shufei Li, Chao Zeng, Chenguang Yang, Wansoo Kim
Submission Deadline:  30 June 2026
Human-Robot Interaction and Human-Centered Robotics
Special Issue Editor:   Anqing Duan, Shuo Ding, Junling Fu, Elisa Iovene, Sipu Ruan, Peng Zhou, Chenguang Yang
Submission Deadline:  30 June 2026
Intelligent Vision-Driven Robotics
Special Issue Editor:   Peng Zhou, David Navarro-Alarcon
Submission Deadline:  31 July 2026
Latest Articles
View more
Deep learning for underwater object detection: a comprehensive survey of models, datasets, and challenges
Hari Bhandari,Pengcheng Liu
Survey16 Jun 2026OPEN ACCESS

This survey provides a comprehensive synthesis of methods, datasets, metrics, and deployment strategies from the evolution of convolutional neural network (CNN)-based detectors to emerging transformer and hybrid architectures. It unifies fragmented literature into a structured taxonomy while integrating results from 2014–2025 studies. The paper reviews benchmark datasets, discusses evaluation protocols and reproducibility standards, and proposes a deployment playbook considering latency, energy, and hardware constraints. Beyond technical performance, it addresses responsible AI practices and ethical challenges in marine observation. By highlighting open problems in multimodal fusion, self-supervised learning, and on-device adaptation, this work aims to guide future research and practical deployment of underwater vision systems. A comprehensive survey of underwater object detection covering classic CNN-based detectors, modern transformer and hybrid models, training and evaluation practices under challenging aquatic conditions, the dataset landscape, deployment constraints (latency/VRAM/energy), and open problems for real-world marine applications.

PDF
FotoBot: an embodied AI photography robot system its design, prototyping and application
Dawei Wang,Chang Chen,Yipeng Pan,Xinzheng Tang,Hua Chen,Jia Pan
Article26 May 2026OPEN ACCESS

This paper introduces FotoBot, a vision-driven autonomous robot photographer designed to enhance human–robot interaction (HRI) and optimize camera parameter control through real-time visual perception. FotoBot integrates Generative Pre-trained Transformers (GPT) for seamless natural language communication, and Bipedal Toric Space (BTS) for vision-guided camera viewpoint control. Utilizing GPT, FotoBot effectively interprets and responds to user instructions, enabling intelligent behavior adjustments. BTS is introduced in this paper for camera position planning, which compresses the camera position representation into three parameters related to photo composition. The BTS representation is analytically converted into Cartesian navigation goals for robot execution. The adoption of BTS ensures the robot’s feasibility around targets and adherence to cinematographic standards. Deployed on a biped robot platform, FotoBot demonstrates comprehensive navigation capabilities, effective human-robot interaction, and outstanding auto-photography performance. User trials conducted at the Hong Kong Science Park have validated FotoBot’s proficiency in navigating complex terrains and capturing high-quality photographs while intelligently responding to user instructions. Videos and code are available on the project website: https://sites.google.com/view/fotobot/fotobot.

PDF
Robotic assembly via self-prompt Segment Anything Model and discrete prompt optimization
Qi Guo,Xing Liu,Haitao Chang,Zhengxiong Liu,Panfeng Huang
Article22 May 2026OPEN ACCESS

In complex assembly scenarios, Multimodal Large Language Models (MLLMs), despite their strong vision-language understanding capabilities, remain limited in their ability to produce structured and executable assembly plans directly from raw visual observations. This difficulty is particularly evident in black-box settings, where prompt design depends heavily on human experience and repeated trial-and-error, often leading to unstable results and high iteration costs. To address these issues, this paper presents a Perception-Recognition-Planning-Action (PRPA) framework for robotic assembly that enables the direct derivation of assembly instructions from scene images. The framework incorporates two key components. A self-prompt Segment Anything Model (SAM) is used to automatically generate structured and verifiable visual representations of assembly parts, ensuring consistent inputs for subsequent reasoning. In addition, a discrete prompt optimization mechanism is introduced to refine prompts for black-box MLLMs through iterative quality assessment and targeted symbolic modifications, improving the reliability of part recognition, semantic attribute extraction, and functional relationship modeling. Together, these components allow the system to generate temporally ordered and physically feasible assembly action sequences, which are represented as symbolic assembly plans suitable for both human interpretation and robotic execution. By combining MLLM-based reasoning with structured assembly planning, the proposed approach shifts the role of language models from interpreting predefined instructions to directly supporting instruction generation from visual input. Experimental results show that the proposed prompt optimization mechanism reduces the average number of reasoning attempts by 48% and achieves 95% stability in part recognition.

PDF
Top Downloaded
View more
Review on path planning for obstacle avoidance oriented to micro-/nanorobots
Tongzhou Ye,Tianhao Peng,Lidong Yang
Review14 Nov 2024OPEN ACCESS
Path planning algorithms are indispensable for controlling micro-/nanorobots through complex and unknown environments in the biomedical and medical fields. With the tasks performed becoming more complex, higher-quality paths are required to avoid obstacles for ensuring the safe and efficient movement of micro-/nanorobots. A comparative analysis of path planning algorithms is conducted to elucidate the algorithm’s application and optimization for different environments. According to the environment modeling approach, existing path planning algorithms are classified into searching, sampling, and dynamic aspects. Searching path planning algorithms directly retrieve the global path possessing minimum cost from the modeled static waypoints. Sampling path planning algorithms employ randomly sampled waypoints within the target space, which eliminates the necessity for environmental modeling. Dynamic path planning algorithms utilize local paths to regulate the motion of micro-/nanorobots in real time. Deep learning networks based on big data will become an important research direction for the control and navigation of micro-/nanorobots. The advantages and limitations of path planning algorithms in varied spatial contexts are elucidated through detailed examples and descriptions, providing a comprehensive understanding of performance and applicability. This review underscores recent advancements in this emerging domain and stands as a testament to the dynamic landscape of micro-/nanorobotics and the continual pursuit of superior motion control solutions.
PDF References
Survey on heterogeneous aquatic robot systems: communication, perception, navigation, control, decision-making and energy management
Ruonan Liu,Xiuzhong Hu,Zihan Jiang,Junzhi Wang,Weidong Zhang
Survey30 May 2025OPEN ACCESS
Heterogeneous aquatic robot systems, consisting of ROVs, AUVs, ASVs, and UAVs, are vital for environmental exploration, monitoring, and task execution. This paper presents advancements in critical technologies within these systems, focusing on communication (underwater acoustic, radio, and optical), multi-sensor fusion, and collaborative navigation techniques. It reviews control strategies like deep reinforcement learning, end-to-end control, and large model-based methods, addressing autonomous decision-making and adaptability in complex environments. The paper also discusses energy management strategies for efficient storage, utilization, and recovery. Furthermore, it explores the ethical and environmental impacts of deploying such systems, emphasizing sustainability and minimizing ecological disruptions. Finally, case studies and applications in ocean exploration and environmental monitoring are highlighted, showcasing the real-world utility and future potential of heterogeneous aquatic robot systems. This work provides valuable insights into the technological, ethical, and practical considerations for developing these systems.
PDF
Optimizing scene flow with neural rigidity prior
Zhiheng Feng,Jiuming Liu,Hesheng Wang
Article28 Nov 2024OPEN ACCESS
Scene flow estimation provides the 3D low-level motion understanding in dynamic scenes. In this paper, we propose an optimization-based scene flow estimation method with neural rigidity prior for the autonomous driving environment. Specifically, we utilize the rigidity prior of dynamic scenes to partition the point clouds into pillars of different resolutions. Then, the flow vector of a point is represented as the average of local rigid transformations associated with the different pillars to which it belongs. To model local rigidity, we employ the neural implicit representation for encoding the intrinsic constraints of pillars. Our method achieves state-of-the-art accuracy on three commonly-used autonomous driving datasets: Argoverse, Waymo, and nuScenes, and even surpasses previous supervised learning-based methods. Experiment results demonstrate the effectiveness of our method, particularly on sparse points in the autonomous driving scene.
PDF