SII 2025 Workshop @ Munich, Germany

Learning Robotic Systems: Next Theory, Algorithm, and Integration

Date: 21st January 2025
Room: "Forum 1-2-3"

Thank you for joining us!

Abstract

With advancements in computing technologies, sensors, machine learning, and AI, "learning" has become indispensable in robotic systems. Traditionally, robotic systems have relied on modularization, where individual components are independently developed and flexibly reused. However, since the advent of deep learning, a new approach to achieving functionality through end-to-end learning has emerged. Deep learning can be used to learn almost any component for various purposes, such as modeling dynamics and environment, creating perception and recognition capabilities, or controllers when combined with reinforcement learning or imitation learning. Furthermore, foundation models such as Large Language Models (LLMs) and generative AI, considered general-purpose AI, have recently gained prominence, potentially reducing the cost of applying learning to robotics. While these approaches have rapidly permeated fields such as image, language, and speech processing, they have not yet achieved the same level of adoption in real-world robotics, primarily due to the lack of sufficient datasets and the complexity of real-world environments. It is essential to advance mathematical theories, approaches, and algorithms to overcome these challenges. Additionally, integrating domain-specific knowledge in robotics and fusing these approaches with traditional system integration techniques are crucial. This workshop, "Learning Robotic Systems: Next Theory, Algorithm, and Integration," will discuss the integration of learning systems within robotic systems and explore new possibilities in theory, algorithms, and system integration.

Research Questions to be Addressed

How can end-to-end learning systems be effectively integrated into existing modular robotic architectures without sacrificing flexibility and reusability?
What advancements in mathematical theory are needed better to model the complexities of real-world environments within robotic systems?
How can the scarcity of high-quality, comprehensive datasets for robotics be mitigated, and what role can synthetic data generation play?

Topics of Interest

Robotic Systems
End-to-End Learning
Modular Architecture
Deep Learning
Reinforcement Learning
Imitation Learning
Physical Simulator
Dynamics and Environment Modeling
Large Language Models (LLMs)
Generative AI
System Integration
Mathematical Theory
Synthetic Data
Scalability

Invited Speakers

Tetsuya Ogata
Waseda university
Kazuhiro Nakadai
Institute of Science Tokyo
Karinne Ramírez-Amaro
Chalmers University of Technology
Eiichi Yoshida
Tokyo University of Science
Kiyoshi Irie
Future Robotics Technology Center, Chiba Institute of Technology

Program

Time (CET)	Event	Speaker
10:25-10:30	Opening Address
10:30-11:00	Predictive Inference for Efficient AI on Robots Generative AI based on foundation models has become a powerful tool enabling robots to perform diverse and complex tasks. However, these models are typically large-scale, posing challenges for deployment on independent devices such as robots. This presentation will discuss the importance of predictive inference (active inference) during task execution, particularly for generative AI on edge devices for robots. We will explain how this approach can reduce learning data and memory usage. Finally, we will provide an overview of our efforts in the Moonshot Project led by the Japanese Cabinet Office.	Tetsuya Ogata (Waseda U)
11:00-11:30	Learning the How and Why from Experience: Combining Interpretable and Explainable Methods in Robot Decision-Making Autonomous robots should efficiently and reliably learn new skills while reusing experiences. What is limiting the advancement of robotic autonomy? Autonomy has rapidly increased with the development of Interpretable and Explainable methods. Interpretable methods focus on understanding how the learned model reaches decisions by examining its structure and relationships. Explainable methods reveal why a model made specific decisions without requiring an understanding of the model itself. Combining these methods in robotic systems enhances the transparency of decision-making processes. While challenging, Interpretable and Explainable capabilities are crucial for deploying Robots in real and dynamic environments. In this talk, I will first introduce our interpretable AI methods that generate compact and general semantic models to infer human activities, enabling robots to gain a high-level understanding of human movements. Next, I will present our causal-based approach, which rapidly empowers robots to predict and prevent immediate and future failures. This method helps robots understand why failures occurred, allowing them to learn from their mistakes, thus improving their future performances. Finally, I will discuss strategies for combining these methods into a single framework by integrating symbolic planning with hierarchical Reinforcement Learning. This integration allows us to learn flexible and reusable robot policies for manipulation tasks, creating holistic sequences of actions that can be executed independently. Interpretable and Explainable AI are key to developing general-purpose robots. These approaches enable robots to make complex decisions in dynamic and unpredictable environments by learning “how” and “why”, ultimately improving robotic autonomy.	Karinne Ramírez-Amaro (Chalmers)
11:30-12:00	Learning Contact Motions by Anthropomorphic Systems Motions involving contacts are still challenging even for the most advanced robots like humanoids witnessing remarkable progress recently while we humans make them naturally in our daily lives. This talk presents ongoing projects unifying well-established model-based methodologies and versatile data-driven approaches based on machine learning from human contact motions. First, a method for detecting and estimating contact forces only from human motions using machine learning techniques. We introduce a network that leverages vector-quantized variational autoencoder (VQ-VAE) and self-attention that learns a small set of discrete feature values representing various contact states, which are converted into contact states through optimization using reduced manual annotations. Second, we address the issue of limited availability of human motions including contacts through a framework for measuring human motions involving surface contacts by collecting data from distributed tactile sensors and motion capture systems simultaneously. The contact information measured by the tactile sensors is mapped on the human body through position-orientation and force registration, and unified with synchronized body motion data. Finally, not exactly categorized as learning, but we introduce a model-based whole-body control method for a humanoid robot involving surface contacts, for future integration with the learning-based motion synthesis.	Eiichi Yoshida (TUS)
12:00-12:30	Fusion of Model-based and Learning-based Robot Control Recent AI technologies have made a significant contribution to robot control. For example, legged robots have achieved outstanding traversing performance by model-free reinforcement learning. However, such a learning-based control still has issues such as expensive learning cost and low reliability. This presentation will introduce the case studies that appropriately improve learning techniques for taking advantage of the strengths of conventional model-based control. Finally, I will introduce a new JST CRONOS project for robotic foundation models.	Taisuke Kobayashi (NII)
12:30-14:00	Lunch Time
14:00-14:30	Deep Learning-Based Robot Audition Robot audition is a research field that began in 2000 with the concept of creating "ears for robots," focusing on auditory processing in real-world environments. Based on acoustic signal processing using microphone arrays, various studies have been conducted on key functions such as sound source localization, sound source separation, and speech recognition. In 2013, the open-source robot audition software HARK was released, enabling robots to distinguish between up to 11 simultaneous speakers. Furthermore, robot audition has expanded into social applications, including drone audition and outdoor bird song analysis. This talk introduces recent advancements in deep learning-based robot audition technologies including sound source localization, sound source separation and automatic speech recognition aimed at further improving these capabilities.	Kazuhiro Nakadai (Science Tokyo)
14:30-15:00	Sytem Integration and Real-World Navigation Experiments of Autonomous Quadruped Robots This presentation focuses on the system integration and field testing of an autonomous quadruped robot, emphasizing deep reinforcement learning (DRL)-based control and SLAM-enabled navigation. We employed a Sim-to-Real transfer approach, addressing discrepancies between simulation and hardware through techniques such as Asymmetric Actor-Critic and domain randomization. The DRL-based system includes a robust walking controller capable of adaptive, energy-efficient locomotion and a path-following controller for trajectory tracking. These components are seamlessly integrated with a SLAM-based navigation system to enable autonomous operation in diverse outdoor environments. The robot’s performance was validated during the Tsukuba Challenge, where it successfully completed a 2.5 km autonomous traversal of an outdoor course. The environment featured stairs, dynamic obstacles such as numerous spectators and pedestrians, and low-feature areas like open parks. This presentation highlights the challenges we faced and lessons learned in applying DRL to real-world environments, such as bridging the simulation-reality gap, ensuring robustness in unpredictable conditions, and overcoming hardware limitations.	Kiyoshi Irie (fuRo)
15:00-15:30	Is Simulation Really Cheap? Exploring Scalable Domain Randomized Reinforcement Learning for Complex Robotic Tasks Domain Randomized Reinforcement Learning (DRRL) has become a leading approach for deriving control policies in real-world robotics, driven by advancements in GPU-supported simulation environments. However, DRRL faces challenges in complex tasks involving numerous physical interactions due to high computational costs, making large-scale simulations impractical. This talk introduces our recent efforts to develop Scalable DRRL frameworks, reducing simulation costs while maintaining policy performance for addressing the challenges in complex robotic tasks.	Takamitsu Matsubara (NAIST)
15:30-16:00	Break
16:00-16:30	Learning-integrated Robotics Systems and their Benchmarking at International Robotics Competitions International robotics competitions bring together the research community to solve real-world, current problems such as drilling in aircraft manufacturing (Airbus Shopfloor Challenge), warehouse automation (Amazon Robotics Challenge), and convenience store automation (Future Convenience Store Challenge). In this talk, I will discuss our approach to these competitions and describe some of the technical difficulties, design philosophy, development, lessons learned, and remaining challenges. In particular, I will talk about the evolution of our 6D-pose estimation subsystem, the challenges of our proposed methods that did not make it to deployment due to accuracy or real-time limitations, and our latest projects involving LLMs, VLMs and World Models.	Gustavo Alfonso Garcia Ricardez (Ritsumeikan U)
16:30-17:00	Panel Discussion
17:00	Closing Remarks

Organizers

Takamitsu Matsubara
Nara Institute of Science and Technology
Taisuke Kobayashi
National Institute of Informatics
Gustavo Alfonso Garcia Ricardez
Ritsumeikan University

Contact: kobayashi[at]nii.ac.jp