The ICVGIP 2024 is pleased to host tutorials on 13th December 2024, offering attendees the opportunity to understand and learn advanced topics in computer vision, graphics, and image processing. With multiple tutorials held in parallel sessions throughout the day, participants can explore a variety of cutting-edge techniques and methodologies presented by leading experts.
Abstract: 3D garments modeling/digitization is a fascinating yet challenging research problem in the 3D computer vision/graphics domain with applications in digital human modeling for AR/VR tele-presence, gaming/animation, entertainment, virtual try-on, etc. Traditional graphics approaches of cloth simulation and reconstruction are computationally demanding thereby limiting their applications to real-time applications. Modern methods of garment reconstruction and recovery from digital media, such as video & images, employ learning-based approaches. However, they often struggle to model complex unseen garment topologies (data bias) and geometric components which are essential to imitate real-world garments accurately. Although breakthroughs like NERF and Gaussian splatting using inverse differential rendering have allowed us to extract 3D representation from multi-view images and video, the surface quality of the geometry is often subpar and not suited for standard graphics and cloth simulation pipelines. Recent advances in generative deep learning have expanded the horizons, allowing the zero-shot generation of diverse 3D objects with just a few lines of text prompts. However, they rely on NeRF-like geometric representation which poses a challenge in modeling garments with open-surfaces, while leaving out high-frequency details and complex structures (belts, straps, buttons, etc.). In this tutorial, we shall cover several learning-based methods proposed in the literature which aims to solve different parts of the meta-problem including cloth simulation, 3D reconstruction and garment retargeting.
Abstract: The tutorial will begin with an introduction to remote sensing data, covering its key characteristics and common applications. It will then delve into problems typically addressed by AI in this field, such as image recognition, semantic segmentation, object detection, multi-modal fusion, and change detection from multi-temporal data, both under fully supervised and limited supervision settings. Open-set recognition, where models identify unseen classes, will also be discussed. Following this, the focus will shift to current trends in foundation models, exploring different types within the context of remote sensing and the emerging role of prompt learning for image analysis. The talk will also cover vision-language tasks in remote sensing, including Visual Question Answering (VQA), visual grounding, and captioning, highlighting how these tasks bridge visual and textual data to enhance the interpretation of remote sensing imagery.
Abstract: Cancer detection through medical image analysis is a critical challenge in healthcare, where early and accurate diagnosis can significantly impact patient outcomes.Traditional methods often rely on handcrafted features and domain-specific models, which may struggle to generalize across diverse datasets. In this work we will describe some of the works being done at Computer Vision Lab at IIT Delhi, highlighting how one can leverage clinical insights into development of novel deep neural architectures for highly accurate cancer detection. To make the lecture self-sufficient, the first lecture will cover the basics of image classification, and object detection using deep neural networks, whereas the second lecture will delve into the cancer detection application.
Abstract: Multimodal Large Language Models (MLLMs) have stormed the world of vision-language understanding. Apart from the last few tasks related to dense prediction (segmentation, depth estimation), MLLMs have become the de-facto model for all visual tasks, often accomplished through chat (QA, Dialog, etc.). Knowing the key innovations that have led to such state-of-the-art models and noting open challenges is important for future research in this direction. The tutorial will provide a broad perspective, starting with some historical context and ending with latest models such as LLaVA-OneVision. We will also discuss some current challenges, efficient inference methods, and extension to multi-image or video, where such models may fail.
Abstract: This tutorial provides an in-depth overview of explainability and causal machine learning within medical imaging, covering advancements in deep learning for image analysis and reconstruction, saliency-based interpretability, and uncertainty quantification. Key topics include deep learning applications for medical image analysis and reconstruction, emphasizing the need for explainable AI (XAI) in deploying these methods for clinical use. We explore saliency maps, uncertainty quantification, and physics-informed deep learning approaches to enhance model interpretability in diverse imaging modalities, such as MRI, XCT, PET, and ultrasound. Mechanistic interpretability and domain-agnostic feature identification are highlighted to improve predictive accuracy across datasets and distinguish normal from pathological aging. The session also addresses causal machine learning, focusing on generating counterfactual predictions to understand biases in imaging data. Practical examples will illustrate challenges and solutions in achieving explainable, robust medical image reconstruction, ultimately aimed at better clinical adoption and diagnostic precision.
Abstract: Generative AI shows great promise in revolutionizing the study and treatment of neurodegenerative diseases like Alzheimer's, Parkinson's, and Huntington's. These diseases, characterized by progressive neuronal decline, require early detection and personalized treatment strategies—both of which are often hampered by limited datasets and high-quality imaging. Generative AI models, including GANs and VAEs, address this gap by creating synthetic brain scans and medical data that augment existing datasets, allowing AI to detect early, subtle signs of neurodegeneration. Additionally, these models simulate disease progression, aiding in predicting patient-specific outcomes and enhancing treatment personalization. In drug discovery, generative AI can simulate molecular structures and predict drug interactions, accelerating the development of individualized therapies. This tutorial offers two sessions, led by expert instructors, to equip learners with practical skills in applying generative AI within healthcare.
Abstract: This tutorial will provide an in-depth exploration of Foundation Models (FMs), which have become a cornerstone of modern AI research and applications. Participants will be introduced to the fundamental concepts of FMs, the processes involved in training these models, and how they can be quickly adapted to the end-use applications. The tutorial is structured into two focused lectures, each designed to build on the previous one, ensuring a comprehensive understanding for all participants.