Why does overlapped pooling help reduce overfitting in conv nets?
I am going to answer this with the pooling example given above with some modifications. Let us say we have three 1D
features as given below.
[0 0 5 0 0 6 0 0 3 0 0 4 0 0]
[0 0 0 5 0 6 0 0 0 3 0 4 0 0]
[0 0 5 0 0 6 0 0 3 0 4 0 0 0]
When pooled using z=2
and s=2
, all 3 features lead to the same result as obtained above, that is
[0, 5, 6, 0, 3, 4, 0]
However when we use z=3
and s=2
, we get the following results respectively
[5, 5, 6, 3, 3, 4, 0]
[0, 5, 6, 0, 3, 4, 0]
[5, 5, 6, 3, 4, 4, 0]
Therefore, with overlapping pooling, we get three different results as opposed to one result when do not use overlapping. This is due to information loss when z=s
which in this case leads to reduction in the amount of data available to train the network, i.e from 3 examples to 1 example. The shrinkage in the data size makes the training model overfit.