binary_crossentropy vs categorical_crossentropy performance of Keras?

I was getting good results with categorical_crossentropy (with 2 classes) and poor with binary_crossentropy. It seems that the problem was with the wrong activation function. The correct settings were:

  • For binary_crossentropy: sigmoid activation, scalar target
  • For categorical_crossentropy: softmax activation, one-hot encoded target