Micro avg and Macro avg of Precision, Recall and F-Score | Model Evaluation Metrics

macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally),

whereas a micro-average will aggregate the contributions of all classes to compute the average metric.

Note: In a multi-class classification setup, micro-average is preferable if you suspect there might be a class imbalance (i.e you may have many more examples of one class than of other classes).

In a multi-class setting micro-averaged precision and recall are always the same.


where c is the class label.

Since in a multi-class setting you count all false instances it turns out that

Hence P = R. In other words, every single False Prediction will be a False Positive for a class, and every Single Negative will be a False Negative for a class. If you treat a binary classification case as a bi-class classification and compute the micro-averaged precision and recall they will be the same.

Note: In the case of averaging binary precision and recall from multiple datasets. In which case the micro-averaged precision and recall are different.

USAGE on Multiple Datasets:

Macro-averaged metrics are used when we want to evaluate systems performance across different datasets.

Micro-averaged metrics should be used when the size of datasets are variable.