A Learning Framework for High Precision Industrial Assembly

Yongxiang Fan, Jieliang Luo, Masayoshi Tomizuka

Welcome!

This page supplements our ICRA paper submission, in which we present a learning framework to learn a high precision industrial assembly policy. The framework contains a semi-supervisor by trajectory optimization, and a reinforcement learning by actor-critic.

Automatic assembly has broad applications in industries. Traditional assembly tasks utilize predefined trajectories or tuned force control parameters, which make the automatic assembly time-consuming, difficult to generalize, and not robust to uncertainties. In this paper, we propose a learning framework for high precision industrial assembly. The framework combines both the supervised learning and the reinforcement learning. The supervised learning utilizes trajectory optimization to provide the initial guidance to the policy, while the reinforcement learning utilizes actor-critic algorithm to establish the evaluation system when the supervisor is not accurate. The proposed learning framework is more efficient compared with the reinforcement learning and achieves better stability performance than the supervised learning. The effectiveness of the method is verified by both the simulation and experiment.

Paper Download

The full paper can be downloaded through this link.

Simulation Results

Two simulation tasks are introduced to verify the algorithm. The first one is U-shape joint assembly, and the second one is the Lego brick insertion. We first use U-shape joint assembly to illustrate the performance of the guided-DDPG compared with pure-DDPG.

Comparison of DDPG and Guided-DDPG (with U-Shape Joint Assembly Task)

Video: Comparison of the pure-DDPG (Left) and the Guided-DDPG (Right)

Adaptability Test

Video: The adaptablity of the learned policy. The policy successfully adapts to various uncertianties except the incomplete hole.

Experimental Results

We use the Lego brick insertion task to illustrate the proposed learning framework. Due to the calibration error, the hole of the Lego piece is not precisely known. A successful assembly policy should adapt according to the environment.

Video: Test results for the Pure-DDPG (Left) and Guided-DDPG (Right). The Pure-DDPG is trained by constraining the exploration space in 1mm out of the boundary of the hole, while the Guided-DDPG is trained by constraining the exploration space in 3mm out of the boundary of the hole. The training time is 2 hours and 1.5 hours, respectively.

Overall Summary Video

Contacts

contact my email for any questions: yongxiang_fanATberkeleyDOTedu