Bio: Bovik (LF ‘23) is the Cockrell Family Regents Endowed Chair Professor at The University of Texas at Austin. An elected member of the United States National Academy of Engineering, the Indian National Academy of Engineering, the National Academy of Inventors, and Academy Europaea, his research interests include image processing, computational vision, visual neuroscience, and streaming video and social media. For his work in these areas, he recieved the IEEE Edison Medal, IEEE Fourier Award, Primetime Emmy Award for Outstanding Achievement in Engineering Development from the Television Academy, Technology and Engineering Emmy Award from the National Academy for Television Arts and Sciences, Progress Medal from The Royal Photographic Society, Edwin H. Land Medal from Optica, and the Norbert Wiener Society Award and Karl Friedrich Gauss Education Award from the IEEE Signal Processing Society. He has also received about 10 ‘best journal paper’ awards, including the IEEE Signal Processing Society Sustained Impact Award. His books include The Essential Guides to Image and Video Processing. He co-founded and was the longest-serving Editor-in-Chief of the IEEE Transactions on Image Processing and created/Chaired the IEEE International Conference on Image Processing which was first held in Austin, Texas, 1994.
Abstract: What makes understanding videos so challenging for large multimodal language models, such as Gemini and GPT4? We will dive into some of the challenges, including fun new tasks, datasets, evaluations and models, covering recently accepted papers at CVPR and NeurIPS 2024.
Bio: Arsha Nagrani is a Staff Research Scientist at Google DeepMind. She obtained her PhD from the VGG group in the University of Oxford with Andrew Zisserman, where her thesis received the ELLIS PhD Award. Prior to that, she received her BA and MEng degrees from the University of Cambridge, UK. Her work has been recognised by a Best Student Paper Award at Interspeech, an Outstanding Paper Award at ICASSP, a Google PhD Fellowship and a Townsend Scholarship, and has been covered by news outlets such as The New Scientist, MIT Tech review and Verdict. Her research is focused on machine learning techniques for video understanding.