Reward Drops in Learning-based Control with an Experimental Validation on Microdrones


TÜKENMEZ N., Fotiadis F., Junior J. M. M., Vamvoudakis K. G., Estrada O. S.

63rd IEEE Conference on Decision and Control, CDC 2024, Milan, İtalya, 16 - 19 Aralık 2024, ss.3819-3824, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/cdc56724.2024.10885986
  • Basıldığı Şehir: Milan
  • Basıldığı Ülke: İtalya
  • Sayfa Sayıları: ss.3819-3824
  • Isparta Uygulamalı Bilimler Üniversitesi Adresli: Hayır

Özet

In this paper, we consider a computationally efficient learning-based control mechanism dealing with dense reward processing for a zero-sum game. The problem has been formulated as the online learning of the Nash equilibrium without requiring any information on the system dynamics. It has firstly been constructed as an infinite horizon optimal control problem, then as an online model-free Q-learning framework, which is composed of critic and actor networks (i.e., for the control and disturbance input). The closed-loop system is also proved to have a stable equilibrium point even in the presence of reward drops. The efficacy of the learning-based controller has been validated through simulations and experiments on micro drones.