Digital content is nowadays available from multiple, heterogeneous sources across a wide range of sensing modalities. Learning from multimodal sources offers the unprecedented possibility of capturing ...