Controlling dexterous hands in high-dimensional action spaces remains challenging, whereas humans naturally achieve such control through internal models that predict and adapt body dynamics. Inspired by this concept, we present MoDex, a neural internal model framework that learns the intrinsic dynamics of dexterous hands via coupled forward and inverse networks. MoDex enables efficient bidirectional planning through integration with a Cross-Entropy Method (CEM) optimizer, achieving superior data efficiency and faster decision-making compared to model-free and model-based baselines. Furthermore, the pretrained internal model serves as a transferable module: when combined with an external dynamics model, it improves data efficiency in in-hand object manipulation, and when coupled with a large language model (LLM), it enables few-shot gesture generation in both simulation and the real world. Extensive experiments across multiple robotic hands demonstrate MoDex’s versatility and effectiveness in high-dimensional control.
model-based learning; dexterous manipulation; high-dimensional control