Ordering of Batch Normalization and Dropout? | Order of Layers in the Model

Dropout is meant to block information from certain neurons completely to make sure the neurons do not co-adapt. So, the batch normalization has to be after dropout otherwise you are passing information through normalization statistics.

If you think about it, in typical ML problems, this is the reason we don’t compute mean and standard deviation over entire data and then split it into train, test and validation sets. We split and then compute the statistics over the train set and use them to normalize and centre the validation and test datasets

so I suggest Scheme 1

-> CONV/FC -> ReLu(or other activation) -> Dropout -> BatchNorm -> CONV/FC

as opposed to Scheme 2

-> CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC 

Please note that this means that the network under Scheme 2 should show over-fitting as compared to network under Scheme 1 but OP ran some tests as mentioned in question and they support Scheme 2