Skip to content

Tianyi Huang, Guangfeng Chen, Multi-step actor-critic framework for reinforcement learning in continuous control

Full Text: PDF
DOI: 10.23952/jano.5.2023.2.01
Volume 5, Issue 2, 1 August 2023, Pages 189-200

 

Abstract. Continuous control is an important issue in control theory. It controls an agent to take action in continuous spaces for transiting from one state to another until achieving the desired goal. A useful tool for this issue is the reinforcement learning where an optimal policy is learned for the agent by maximizing the cumulative reward of the state transitions. However, most existing reinforcement learning methods consider only the one-step transition and one-step reward in each state. In this case, it is hard to recognize the information hidden in the sequence of the previous states and accurately estimate the cumulative reward. Therefore, these methods cannot learn the optimal policy both fast and effectively for continuous control. To solve this problem, in this paper, we propose a new framework, called Multi-step Actor-critic Framework (MAF) for reinforcement learning. In MAF, the convolutional deterministic policy is used to learn the information hidden in the sequence of the previous states by convolutional neural networks, and then n-step temporal difference learning is used to accurately estimate the cumulative reward by considering the rewards from n-step states. Based on an effective reinforcement learning method, TD3, the implementation of our MAF is in nTD3. The theoretical analysis and experiment illustrate that our nTD3 can learn the policy not only better but also faster than the existing RL methods for continuous control.

 

How to Cite this Article:
T. Huang, G. Chen, Multi-step actor-critic framework for reinforcement learning in continuous control, J. Appl. Numer. Optim. 5 (2023), 189-200.