Deep learning has revolutionized computer vision and visual perception. However, as yet, most significant results are still based on the truly artificial supervised learning communication protocol, which sets in fact a battlefield for computers, but it is far from being natural. In this talk we argue that, when relying on supervised learning, we have been working on a problem that is - from a computational point of view - remarkably different and likely more difficult with respect to the one offered by Nature, where motion is in fact in charge for generating visual information. We claim that the massive image supervision can in fact be replaced with the natural communication protocol arising from living in a visual environment, just like animals do. This leads to formulate a theory of vision based on the driving assumption that visual agents acquire information in their own visual environment, without relying on labelled visual databases. In particular, we show that feature learning arises mostly from motion invariance principles that turns out to be fundamental for detecting the object identity as well as for supporting object affordance. We introduce two fundamental principles of visual perception. The first principle involves consistency issues, namely the preservation of material points identity during motion. The second principle of visual perception is about its affordance as transmitted by coupled objects - typically humans. The principle states that the affordance is invariant under the coupled object movement. Overall, these principles are described under a vision field theory for expressing those motion invariance principles. The theory enlightens the indissoluble pair of visual features and their own conjugated velocities, thus extending the classic brightness invariance principle for the optical flow estimation. Finally, we describe the undergoing experimental setting for the validation of the theory.
Join at: imt.lu/seminar