A one-layer object position estimator using the expected value
By Simon on Regards: Machine Learning; Neural Networks; Convolution Neural Networks;The position estimation of objects in images is usually implemented by multi-layer convolutional neural networks, which final layer is a linear function. These networks often support scale-invariance, due to the convolution layers.
For very basic applications, such an approach can be more than needed and lead to a long training time and high memory consumption.
If scale invariance is not needed, the following approach can be used to create a one-layer object position detection neural network. The input images are processed as follows:
- Apply a Sobel filter.
- Apply a convolutional layer, ReLU and divide by the maximum value.
- Raise the values of the output to the power of 5 or a higher number.
- Normalize the output.
- Calculate the expected value of the image.
(Note, that some steps have been omitted to provide a simpler explanation)
Steps 2 to 4 will enforce that the output is a probability distribution. Due to the exponentiation in step 3, the maximum will remain while local minimia will disappear.
The following picture shows a test picture (original is from Urlaubsguru.de). Ten smiley faces have been added to a background image. The model was trained to find the only happy face on this image.

The next picture shows the output of step 2.

The last picture shows the output of step 4.

In a training with 30 images with the Adam optimizer, this approach converges after about 50 epochs, which shows that this method is suitable for this task.
Propose a change