Full Text: PDF
Volume 3, Issue 3, 31 December 2021, Pages 513-519
Abstract. Deep Reinforcement Learning (DRL) has achieved great success in making decisions on some complex tasks. Unfortunately, existing DRL algorithms are usually sample inefficient in that they require a huge amount of interactions with the environment to gain a desirable performance. Recently, Episodic Memory Deep Q-Networks (EMDQN) substantially improves the sample efficiency by episodic memory. However, rewards in episodic memory are delayed because they are obtained after the agent interacts with the environment in a multi-step trial and error manner, which means that EMDQN is sample inefficient to some extent. In this paper, we propose a new algorithm, Episodic Memory Hit Ratio DQN (EMHR-DQN), to improve sample efficiency by reward shaping. Inspired by reward shaping methods, we design a new reward shaping function Episodic Memory Hit Ration (EMHR) to provide additional rewards for the retrieval result of episodic memory. In this way, our method can modify rewards in episodic memory and provide useful supervision for the training of the agent. Experimental results verify the superiority of our method.
How to Cite this Article:
Ruiyuan Zhang, Xianchao Zhu, William Zhu, Improved sample efficiency by episodic memory hit ratio deep Q-networks, J. Appl. Numer. Optim. 3 (2021), 513-519.