IUPRAI LOGO
ICVGIP 2024
IIIT Bangalore | 13th to 15th December, 2024
Plenary Speakers

Alan Bovik
Secrets of Video Quality Prediction
Alan C. Bovik
Professor, The University of Texas at Austin

Bio: Bovik (LF ‘23) is the Cockrell Family Regents Endowed Chair Professor at The University of Texas at Austin. An elected member of the United States National Academy of Engineering, the Indian National Academy of Engineering, the National Academy of Inventors, and Academy Europaea, his research interests include image processing, computational vision, visual neuroscience, and streaming video and social media. For his work in these areas, he recieved the IEEE Edison Medal, IEEE Fourier Award, Primetime Emmy Award for Outstanding Achievement in Engineering Development from the Television Academy, Technology and Engineering Emmy Award from the National Academy for Television Arts and Sciences, Progress Medal from The Royal Photographic Society, Edwin H. Land Medal from Optica, and the Norbert Wiener Society Award and Karl Friedrich Gauss Education Award from the IEEE Signal Processing Society. He has also received about 10 ‘best journal paper’ awards, including the IEEE Signal Processing Society Sustained Impact Award. His books include The Essential Guides to Image and Video Processing. He co-founded and was the longest-serving Editor-in-Chief of the IEEE Transactions on Image Processing and created/Chaired the IEEE International Conference on Image Processing which was first held in Austin, Texas, 1994.

Arsha Nagrani
Long Video Understanding in the age of large MLMs
Arsha Nagrani
Staff Research Scientist, Google Deepmind

Abstract: What makes understanding videos so challenging for large multimodal language models, such as Gemini and GPT4? We will dive into some of the challenges, including fun new tasks, datasets, evaluations and models, covering recently accepted papers at CVPR and NeurIPS 2024.

Bio: Arsha Nagrani is a Staff Research Scientist at Google DeepMind. She obtained her PhD from the VGG group in the University of Oxford with Andrew Zisserman, where her thesis received the ELLIS PhD Award. Prior to that, she received her BA and MEng degrees from the University of Cambridge, UK. Her work has been recognised by a Best Student Paper Award at Interspeech, an Outstanding Paper Award at ICASSP, a Google PhD Fellowship and a Townsend Scholarship, and has been covered by news outlets such as The New Scientist, MIT Tech review and Verdict. Her research is focused on machine learning techniques for video understanding.

Arsha Nagrani
Multimodal Spatial Intelligence for Interacting in a Dynamic World
Deva Ramanan
Professor, Robotics Institute, Carnegie Mellon University

Abstract: Artificial intelligence and machine learning are enjoying a period of tremendous progress, driven in large part by scale, compute, and learnable neural representations. However, such innovations have yet to translate to the physical world, as technologies such as self-driving vehicles are still restricted to limited deployments. In this talk, I will argue that autonomy requires spatial three-dimensional understanding integrated with intuitive physical models of a changing world. To do so, I will discuss a variety of models that revisit classic "analysis by synthesis" approaches to scene understanding, taking advantage of recent advances in differentiable rendering and simulation. But to enable data-driven autonomy for safety-critical applications, I will also argue that the community needs new perspectives on data curation and annotation. Toward this end, I will discuss approaches that leverage multimodal vision-language models to better characterize datasets and models.

Bio: Deva Ramanan is a Professor in the Robotics Institute at Carnegie- Mellon University and the former director of the CMU Center for Autonomous Vehicle Research. His research interests span computer vision and machine learning, with a focus on visual recognition. He was awarded the David Marr Prize in 2009, the PASCAL VOC Lifetime Achievement Prize in 2010, the IEEE PAMI Young Researcher Award in 2012, named one of Popular Science's Brilliant 10 researchers in 2012, named a National Academy of Sciences Kavli Fellow in 2013, won the Longuet-Higgins Prize for fundamental contributions in computer vision in both 2018 and 2024, and was recognized for best paper finalist / honorable mention awards in CVPR 2019, ECCV 2020, and ICCV 2021. His work is supported by NSF, ONR, DARPA, as well as industrial collaborations with Intel, Google, and Microsoft. He served at the program chair of the IEEE Computer Vision and Pattern Recognition (CVPR) 2018. He is on the editorial board of the International Journal of Computer Vision (IJCV) and is an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). He regularly serves as a senior program committee member for CVPR, the International Conference on Computer Vision (ICCV), and the European Conference on Computer Vision (ECCV). He also regularly serves on NSF panels for computer vision and machine learning.

We will be adding details of couple more plenary speakers soon.