## Bregman Divergence Bounds and the Universality of the Logarithmic Loss. (arXiv:1810.07014v1 [cs.IT])

A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if the minimizer of the expected loss is the true underlying probability. In this work we show that for binary classification, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss from this set. This property suggests that the log-loss is universal in the sense that it provides performance guarantees to a broad class of accuracy measures. Importantly, our notion of universality is not restricted to a specific problem. This allows us to apply our results to many applications, including predictive modeling, data clustering and sample complexity analysis. Further, we查看全文

## Solidot 文章翻译

 你的名字 留空匿名提交 你的Email或网站 用户可以联系你 标题 简单描述 内容 A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if the minimizer of the expected loss is the true underlying probability. In this work we show that for binary classification, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss from this set. This property suggests that the log-loss is universal in the sense that it provides performance guarantees to a broad class of accuracy measures. Importantly, our notion of universality is not restricted to a specific problem. This allows us to apply our results to many applications, including predictive modeling, data clustering and sample complexity analysis. Further, we
﻿