# Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning

### ICML 2023

Overview of Quasimetric RL (QRL) | ||||

Quasimetric Geometry | + | (Push apart start state and goal
while maintaining local distances) | = |
Optimal Value $V^*$ AND High-Performing Goal-Reaching Agents |

## Abstract

In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure (see also these works). This paper introduces *Quasimetric Reinforcement Learning (QRL)*, a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized `MountainCar`

environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.

## Optimal Value Functions are Quasimetrics

*Quasimetrics*are a generalization of metrics in that they do not require symmetry. They are well suited for characterizing optimal cost-to-go (value function) in goal-reaching tasks that generally have asymmetrical dynamics

## Quasimetric Models for Learning Value Functions

*quasimetric models*are parametrized models (based on neural networks) $\{d_\theta\}_\theta$ that

## Quasimetric Reinforcement Learning (QRL)

## QRL Accurately Recovers $V^*$

### Offline Learning on Discretized `MountainCar`

`MountainCar`

is a $2$-dimensional vector containing the car's (horizontal) position and velocity.
Each plot shows the estimated values from every state towards a single goal (indicated in leftmost column) as a 2-dimensional image (velocity as $x$-axis, position as $y$-axis).
## QRL Quickly Finds High-Quality Policies in Both Offline and Online RL

### Offline Goal-Reaching `maze2d`

(Normalized Scores/Rewards)

### Online Goal-Conditional RL Benchmarks with State-based and Image-based Observations

## PaperICML 2023. arXiv 2304.01203. ## CitationTongzhou Wang, Antonio Torralba, Phillip Isola, Amy Zhang. "Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning" |

`bibtex`

entry

@inproceedings{tongzhouw2023qrl, title={Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning}, author={Wang, Tongzhou and Torralba, Antonio and Isola, Phillip and Zhang, Amy}, booktitle={International Conference on Machine Learning}, organization={PMLR}, year={2023} }