Enhanced DACER Algorithm with Multimodal Q-value Distribution for Risk-Sensitive Stochastic Vehicle Environments

Abstract

Reinforcement learning demonstrates strong capabilities in handling complex control tasks, especially in the field of autonomous driving where vehicles cope with uncertain environments. Existing reinforcement learning methods attempt to model the value distribution using unimodal, but in this modeling process, a significant amount of the complete distribution information is lost. In response to this problem, we propose the DACER++, an online multimodal distributional RL algorithm. DACER++ characterize the value distribution as multimodal will enhance the accuracy of characterizing the value distribution and improve algorithm performance. We construct the quantiles value network and use quantile regression to approximate the full quantile function of the state-action return distribution. This method allows for the precise modeling of multi-modal distributions, and formulates risk-sensitive policies adaptable to different environment. Then, we integrate quantiles value network with the actor-critic architecture algorithm DACER. Experiments on multi-goal tasks and MuJoCo benchmarks show that DACER++ not only has multimodal policy representation capability, but also achieves state-of-the-art performance. In stochastic vehicle meeting environments, DACER++ can learn different multimodal value distributions and multimodal trajectories according to various risk preferences, including the conservative and aggressive driving style.

Publication
IEEE Intelligent Vehicles Symposium (IV), 2025