Dr. Shah Nawaz

German Electron Synchrotron DESY, Germany

Mulitmodal Representation and Learning


Shah Nawaz is a postdoc researcher at German Electron Synchrotron with focus on computer vision and deep learning. He had developed various techniques in the doctoral and postdoctoral program to learn presentation of various multimodal applications ranging from classification to cross-modal retrieval.


Deep learning has remarkably improved the state-of-the-art speech recognition, visual object detection and text processing tasks. Interestingly, the majority of these tasks are focused on single modality (images, text, speech, etc.), however real-world scenarios present data in a multimodal fashion — we see objects, hear sounds etc. Moreover, recent years have seen an explosion in multimodal data on the web. Typically, users combine text, image, audio or video to sell a product over an e-commerce platform or express views on social media. In addition, it is well-known that multimodal data provide enriched information to capture a particular “concept” than individual modalities. Thus the talk will focus on learning representation of multiple modalities for various computer vision tasks including cross-modal verification, cross-modal matching, zero-shot learning etc.