Smoothing Neural Network with Adaptive Liquid Time-constant for Reinforcement Learning Tasks

Abstract

The smoothness of control actions is a significant challenge faced by deep reinforcement learning (RL) techniques in solving optimal control problems. Existing RL-trained policies tend to produce non-smooth actions due to high-frequency input noise and unbounded Lipschitz constants in neural networks. Current neural ordinary differential equation (ODE) networks partially address the smoothing issue through differential equations with low-pass filtering properties, but they fail to directly mitigate control action fluctuations or address smoothing issues deriving from large Lipschitz constants. To remedy this, we propose a neural ODE network variant, called adaptive liquid time-constant (ALTC), to smooth out control actions. We propose an adaptive ODE that estimates the rate of change in actions near the current state, along with a mechanism that adaptively regulates the boundaries of neuron hidden state variations based on these estimates. This effectively minimizes the amplitude of hidden state fluctuations at adjacent time intervals under noisy observational inputs. Therefore, this approach can address the non-smoothing issue associated with large Lipschitz constants. By utilizing adaptive ODE as neurons, we further advanced the ALTC network serving as RL policy approximators. This network is compatible with most existing RL algorithms, offering improved adaptability compared to prior approaches. Various experiments show that our ALTC network demonstrates superior anti-interference capabilities and smoother action outputs than the multi-layer perception and smooth network architectures like LipsNet.

Publication
Annual Conference on Neural Information Processing Systems (NeurIPS), 2024 (under review)