Research using this robot

This study proposes a new exception handling at termination of episode in reinforcement learning. The most popular way is to assume the terminal value to be zero, but it causes overestimation if the reward function is designed to be negative. This overestimation is prone to collapse learning by repeating the same failure. Therefore, the new exception handling always underestimates the terminal value intentionally. As a result, a real-robot demonstration, synchronized swinging task, was successfully accomplished only with the proposed exception handling.

arxiv link