Bio: Bovik (LF ‘23) is the Cockrell Family Regents Endowed Chair Professor at The University of Texas at Austin. An elected member of the United States National Academy of Engineering, the Indian National Academy of Engineering, the National Academy of Inventors, and Academy Europaea, his research interests include image processing, computational vision, visual neuroscience, and streaming video and social media. For his work in these areas, he recieved the IEEE Edison Medal, IEEE Fourier Award, Primetime Emmy Award for Outstanding Achievement in Engineering Development from the Television Academy, Technology and Engineering Emmy Award from the National Academy for Television Arts and Sciences, Progress Medal from The Royal Photographic Society, Edwin H. Land Medal from Optica, and the Norbert Wiener Society Award and Karl Friedrich Gauss Education Award from the IEEE Signal Processing Society. He has also received about 10 ‘best journal paper’ awards, including the IEEE Signal Processing Society Sustained Impact Award. His books include The Essential Guides to Image and Video Processing. He co-founded and was the longest-serving Editor-in-Chief of the IEEE Transactions on Image Processing and created/Chaired the IEEE International Conference on Image Processing which was first held in Austin, Texas, 1994.
Abstract: What makes understanding videos so challenging for large multimodal language models, such as Gemini and GPT4? We will dive into some of the challenges, including fun new tasks, datasets, evaluations and models, covering recently accepted papers at CVPR and NeurIPS 2024.
Bio: Arsha Nagrani is a Staff Research Scientist at Google DeepMind. She obtained her PhD from the VGG group in the University of Oxford with Andrew Zisserman, where her thesis received the ELLIS PhD Award. Prior to that, she received her BA and MEng degrees from the University of Cambridge, UK. Her work has been recognised by a Best Student Paper Award at Interspeech, an Outstanding Paper Award at ICASSP, a Google PhD Fellowship and a Townsend Scholarship, and has been covered by news outlets such as The New Scientist, MIT Tech review and Verdict. Her research is focused on machine learning techniques for video understanding.
Abstract: Artificial intelligence and machine learning are enjoying a period of tremendous progress, driven in large part by scale, compute, and learnable neural representations. However, such innovations have yet to translate to the physical world, as technologies such as self-driving vehicles are still restricted to limited deployments. In this talk, I will argue that autonomy requires spatial three-dimensional understanding integrated with intuitive physical models of a changing world. To do so, I will discuss a variety of models that revisit classic "analysis by synthesis" approaches to scene understanding, taking advantage of recent advances in differentiable rendering and simulation. But to enable data-driven autonomy for safety-critical applications, I will also argue that the community needs new perspectives on data curation and annotation. Toward this end, I will discuss approaches that leverage multimodal vision-language models to better characterize datasets and models.
Bio: Deva Ramanan is a Professor in the Robotics Institute at Carnegie- Mellon University and the former director of the CMU Center for Autonomous Vehicle Research. His research interests span computer vision and machine learning, with a focus on visual recognition. He was awarded the David Marr Prize in 2009, the PASCAL VOC Lifetime Achievement Prize in 2010, the IEEE PAMI Young Researcher Award in 2012, named one of Popular Science's Brilliant 10 researchers in 2012, named a National Academy of Sciences Kavli Fellow in 2013, won the Longuet-Higgins Prize for fundamental contributions in computer vision in both 2018 and 2024, and was recognized for best paper finalist / honorable mention awards in CVPR 2019, ECCV 2020, and ICCV 2021. His work is supported by NSF, ONR, DARPA, as well as industrial collaborations with Intel, Google, and Microsoft. He served at the program chair of the IEEE Computer Vision and Pattern Recognition (CVPR) 2018. He is on the editorial board of the International Journal of Computer Vision (IJCV) and is an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). He regularly serves as a senior program committee member for CVPR, the International Conference on Computer Vision (ICCV), and the European Conference on Computer Vision (ECCV). He also regularly serves on NSF panels for computer vision and machine learning.
Bio: SP Arun started out as an electrical engineer, read too much science fiction for his own good and turned into a neuroscientist. He is fascinated by how the brain transforms sensation into perception, particularly for vision. His lab at the Centre for Neuroscience, Indian Institute of Science, studies how the brain solves vision by investigating perception and brain activity in humans, by investigating behavior and neural activity in monkeys and by comparing vision in brains and machine algorithms. For more details visit the homepage of his research group, the Vision Lab @ IISc.
Bio: Professor Vasilis Ntziachristos studied electrical engineering at Aristotle University in Thessaloniki. Following his M.Sc. and Ph.D. in the Department of Bioengineering at the University of Pennsylvania, he was then appointed Assistant Professor and Director of the Laboratory for Bio-Optics and Molecular Imaging at Harvard University and Massachusetts General Hospital. Since 2007, he has served as Professor of Medicine and Electrical Engineering and the Chair of Biological Imaging at the Technical University of Munich and Director of the Institute of Biological and Medical Imaging at Helmholtz Munich. Prof. Ntziachristos is also currently Director of Bioengineering at the Helmholtz Pioneer Campus, the Head of the Bioengineering Department at Helmholtz Munich, and Director of the IESL at FORTH. Prof. Ntziachristos is the founder of the journal Photoacoustics, regularly Chairs in international meetings and councils and has received numerous awards and distinctions, including the Karl Heinz Beckurts prize (2021), the Chaire Blaise Pascal (2019) from the Region Ile-de-France, the Gold Medal from the Society for Molecular Imaging (2015), the Gottfried Leibnitz prize from the German Research Foundation (2013), the Erwin Schrödinger Award (2012) and was named one of the world's top innovators by the Massachusetts Institute of Technology (MIT) Technology Review in 2004. In 2024, he has been elected as a new member of the German Academy of Sciences Leopoldina.
Abstract: Neural rendering has advanced at outstanding speed in recent years, with the advent of Neural Radiance Fields (NeRFs), typically based on volumetric ray-marching. Last year, our group developed an alternative approach, 3D Gaussian Splatting, that has better performance for training, display speed and visual quality and has seen widespread adoption both academically and industrially. In this talk, we describe the 20+ year process leading to the development of this method and discuss some future directions. We will start with a short historical perspective of our work on image-based and neural rendering over the years, outlining several developments that guided our thinking over the years. We then discuss a sequence of three point-based rasterization methods for novel view synthesis -- developed in the context of G. Kopanas' Ph.D. and the ERC Advanced Grant FUNGRAPH -- that culminated with 3D Gaussian Splatting. We will emphasize how we progressively overcame the challenges as the research progressed. We first discuss differentiable point splatting and how we extended it in our first approach that enhances points with neural features, optimizing geometry to correct reconstruction errors. We briefly review our second method that handles highly reflective objects, where we use multi-layer perceptrons (MLP), to learn the motion of reflections and to perform the final rendering of captured scenes. We then discuss 3D Gaussian Splatting, that provides the high-quality real-time rendering for novel view synthesis using a novel 3D scene representation based on 3D Gaussians and fast GPU rasterization. We will conclude with a discussion of future directions for 3D Gaussian splatting with examples from recent work.
Bio: George Drettakis graduated in Computer Science from the University of Crete, Greece, and obtained an M.Sc. and a Ph.D., (1994) at the University of Toronto, with E. Fiume. After an ERCIM postdoc in Grenoble, Barcelona and Bonn, he obtained a Inria researcher position in Grenoble in 1995, and his "Habilitation" at the University of Grenoble (1999). He then founded the REVES research group at INRIA Sophia-Antipolis, and now heads the follow-up group GRAPHDECO. He is a INRIA Senior Researcher (full professor equivalent). He received the Eurographics (EG) Outstanding Technical Contributions award in 2007, the EG Distinguished Career Award in 2024 and is an EG fellow. He has received two prestigious ERC Advanced Grants in 2018 and in 2024. He was associate editor for ACM Trans. on Graphics, technical papers chair of SIGGRAPH Asia 2010, co-chair of Eurographics IPC 2002 & 2008, chairs the ACM SIGGRAPH Papers Advisory Group and the EG working group on Rendering (EGSR). He has worked on many different topics in computer graphics, with an emphasis on rendering. He initially concentrated on lighting and shadow computation and subsequently worked on 3D audio, perceptually-driven algorithms, virtual reality and 3D interaction. He has worked on textures, weathering and perception for graphics and in recent years focused on novel-view synthesis, relighting as well as material acquisition often using deep learning methodologies.