Brier Score vs Log Loss | Model Evaluation Metrics

Since the log-likelihood function (combined with the prior if Bayesian modelling is being used) is the gold standard optimality criterion, it is best to use the log likelihood (a linear translation of the logarithmic accuracy scoring rule). This automatically extends to ordinal and multinomial (polytomous) Y.

There are only three reasons I can think of for not using the log likelihood in summarizing the model’s predictive value:

  1. you seek to describe model performance using a measure the model was not optimizing (not a bad idea; often why we use the Brier score)
  2. you have a single predicted probability of one or zero that was “wrong”, rendering an infinite value for the logarithmic score
  3. it’s often hard to know “how good” a value of the index is (same for Brier score, not so much for 𝑐c-index, i.e., concordance probability or AUROC)

by Frank Harrell on StackExchange

Reference:

  1. https://stats.stackexchange.com/questions/297528/multi-lable-classification-brier-score-or-log-loss/321003
  2. https://stats.stackexchange.com/questions/126965/justifying-and-choosing-a-proper-scoring-rule?noredirect=1&lq=1