Hellinger-Kantorovich distance

Author

Djordje Nikolic

Major disadvantage of Wasserstein distance is that it is meaningful only for measures of equal total mass. For purpose of applications, in signal processing, image classification, machine learning, medical imaging etc, it is essential to describe an distance optimal transport distance on the space of positive measures. In 2015, three different research groups introduced new optimal transport distance on the space of positive Radon measures with various applications in mind; imaging applications, primal/dual and static formulations, population dynamics and gradient flows (Chizat et al. 2015),(Chizat et al. 2018),(Kondratyev, Monsaingeon, and Vorotnikov 2016),(Liero, Mielke, and Savaré 2016),(Liero, Mielke, and Savaré 2018). Some authors call it Hellinger-Kantorovich (\(HK\)) distance, since it is an interpolation between Kantorovich/Wasserstein and Hellinger distances; others call it Kantorovich-Fischer-Rao (\(FR\)) distance, since it is an interpolation between Kantorovich/Wasserstein and Fischer-Rao distance. For detailed explanation of terminology issues, see Remark 2.2 (Mielke and Zhu 2025).

Dynamic and Static Formulation of the Hellinger-Kantorovich Distance

Dynamic Formulation

One way to define the Hellinger-Kantorovich distance between positive measures is in the dynamic sense of Benamou-Brenier, Dynamic Optimal Transport via

\[ \mathbf{HK}(\mu,\nu) := \min \left\{ \int_{0}^{1} \int_\Omega (\alpha | \nabla u |^{2} + \beta |u |^{2}) d\rho_{t} dt : \partial_{t}\rho = \alpha \hspace{1mm} div (\rho\nabla u) - \beta \rho u, \rho(0) = \mu, \rho(1) = \nu \right\}, \] where we minimize an action over the curves with endpoints \(\mu,\) \(\nu\) that solve continuity equation with source \[ \partial_{t}\rho = \alpha \hspace{1mm} div(\rho\nabla u) - \beta \rho u \] in the sense of distributions.

Static Formulation

On the other hand, to generalize the optimal transport Kantorovich problem to the setting of the positive measures, we define the entropy-transport functional:

\[ \mathbf{ET}_{c,\Psi}(\Pi|\mu,\nu) := \int c(x_0,x_1) d\Pi(x_0, x_1) + \Psi(\pi_{0}|\mu) + \Psi(\pi_{1}|\nu), \] where \(\pi_0(dx_0):= \Pi(dx_0,\Omega), \pi_1(dx_1):= \Pi(\Omega,dx_1).\) In this approach, we compute the standard optimal transport distance between \(\pi_0\) and \(\pi_1,\) and penalize difference between \(\pi_0\) and \(\mu,\) \(\pi_1\) and \(\nu,\) using the divergence functional \(\Psi.\)

Connection between the two descriptions

For suitable choice of \(\Psi = \frac{1}{\beta}D_{KL},\) for \(KL\) divergence (Kullback-Leibler divergence), and a cost function:

\[ c(x_0,x_1) := \begin{cases} \frac{-2}{\beta} \log(\cos \left( \sqrt{\frac{\beta}{4\alpha}}|x_0-x_1| \right)) &\text{ if } |x_0-x_1|<\pi \sqrt{\frac{\alpha}{\beta}} , \\ +\infty &\text{ otherwise.} \end{cases} \] we obtain connection between the two definitions

\[ \mathbf{HK}^{2}(\mu,\nu) := \inf_{\Pi \in \mathcal{M}^{+}(\Omega \times \Omega)} \mathbf{ET}_{c,\Psi} (\Pi|\mu,\nu). \]See (Liero, Mielke, and Savaré 2016) for the proof.

Gradient Flows

In works by Jordan-Kinderlehrer-Otto (JKO) (Jordan, Kinderlehrer, and Otto 1998) and Otto (Otto 2001), motivated by the optimization problem \(min_{\rho \in \mathcal{P}_{2}(\mathbb{R}^{d})} \hspace{0.1cm} F(\rho),\) for the energy functional \(F\) in the Wasserstein space, authors studied partial differential equations of type \[ \partial_{t}\rho + div\left(\rho \nabla \frac{\delta F}{\delta \rho} [\rho]\right) = 0, \] as gradient flows of \(F,\) where \(\frac{\delta F}{\delta \rho} [\rho]\) is its first variation.

For the energy functional \[ F(\rho) = \int_{\Omega} \left( U(\rho) + V(x)\rho + \frac{1}{2}\rho K\star \rho \right)dx, \] where \(U\) is internal energy, \(V\) potential function, and \(K\) interaction kernel, we have associated weak solutions of the partial differential equation \[ \partial_{t}\rho = div(\rho \nabla(U'(\rho)+V+K*\rho)) \] to the gradient flow \[ \partial_{t} \rho = - grad_{W_{2}} F(\rho). \]

Following the paper on the JKO scheme for Hellinger-Kantorovich distance (Gallouët and Monsaingeon 2017), in a local sense, infinitesimally, we have
\[ \mathbf{HK}^{2} \approx \mathbf{W}_{2}^{2} + \mathbf{FR}^{2}, \] using Formal Riemannian Structure of Wasserstein Metric and Riemannian structure of Fischer-Rao metric. Hence, we obtain \[ \| grad_{HK} F(\rho) \|^{2} = \| grad_{W_{2}} F(\rho) \|^{2} + \| grad_{FR} F(\rho) \|^{2} \]

Now, we can observe existence of weak solutions of the partial differential equation \[ \partial_{t}\rho = div(\rho \nabla(U'(\rho)+V+K*\rho)) - \rho(U'(\rho)+V+K*\rho), \] associated with \(HK\) gradient flows, \[ \partial_{t} \rho = - grad_{HK} F(\rho).. \]
Under reasonable conditions, metric gradient flows can be characterized in this case (Gallouët and Monsaingeon 2017). For complete characterization of gradient flows of positive, and probability measures, with respect to Hellinger-Kantorovich geometry, and applications in computational algorithms machine learning, see (Mielke and Zhu 2025).

References

Chizat, Lenaic, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard. 2015. “Unbalanced Optimal Transport: Geometry and Kantorovich Formulation (2015).” arXiv Preprint arXiv:1508.05216.
———. 2018. “An Interpolating Distance Between Optimal Transport and Fisher–Rao Metrics.” Foundations of Computational Mathematics 18: 1–44.
Gallouët, Thomas O, and Leonard Monsaingeon. 2017. “A JKO Splitting Scheme for Kantorovich–Fisher–Rao Gradient Flows.” SIAM Journal on Mathematical Analysis 49 (2): 1100–1130.
Jordan, Richard, David Kinderlehrer, and Felix Otto. 1998. “The Variational Formulation of the Fokker–Planck Equation.” SIAM Journal on Mathematical Analysis 29 (1): 1–17.
Kondratyev, Stanislav, Léonard Monsaingeon, and Dmitry Vorotnikov. 2016. “A New Optimal Transport Distance on the Space of Finite Radon Measures.”
Liero, Matthias, Alexander Mielke, and Giuseppe Savaré. 2016. “Optimal Transport in Competition with Reaction: The Hellinger–Kantorovich Distance and Geodesic Curves.” SIAM Journal on Mathematical Analysis 48 (4): 2869–2911.
———. 2018. “Optimal Entropy-Transport Problems and a New Hellinger–Kantorovich Distance Between Positive Measures.” Inventiones Mathematicae 211 (3): 969–1117.
Mielke, Alexander, and Jia-Jie Zhu. 2025. “Hellinger-Kantorovich Gradient Flows: Global Exponential Decay of Entropy Functionals.” arXiv Preprint arXiv:2501.17049.
Otto, Felix. 2001. “The Geometry of Dissipative Evolution Equations: The Porous Medium Equation.”