Micro avg and Macro avg of Precision, Recall and F-Score | Model Evaluation Metrics
macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally),
whereas a micro-average will aggregate the contributions of all classes to compute the average metric.
Note: In a multi-class classification setup, micro-average is preferable if you suspect there might be a class imbalance (i.e you may have many more examples of one class than of other classes).
In a multi-class setting micro-averaged precision and recall are always the same.
where c is the class label.
Since in a multi-class setting you count all false instances it turns out that
Hence P = R. In other words, every single False Prediction will be a False Positive for a class, and every Single Negative will be a False Negative for a class. If you treat a binary classification case as a bi-class classification and compute the micro-averaged precision and recall they will be the same.
Note: In the case of averaging binary precision and recall from multiple datasets. In which case the micro-averaged precision and recall are different.
USAGE on Multiple Datasets:
Macro-averaged metrics are used when we want to evaluate systems performance across different datasets.
Micro-averaged metrics should be used when the size of datasets are variable.