IUPRAI LOGO
ICVGIP 2024
IIIT-Bangalore | 13th to 15th December, 2024
Accepted Symposium Papers

We are excited to showcase the accepted Symposium papers for presentation at ICVGIP 2024. Explore the full list of accepted papers below, and join us at the conference to engage with the authors!
The list doubles as a Detailed Technical Program so Authors will know which time slot their presentation is according to the Conference Program.
For list of Accepted Regular Papers jump here.
For list of Accepted Tiny Papers jump here.
For list of Vision India Session Papers jump here.

Young Researcher's Symposium

171: Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving

Depanshu Sani , IIIT-Delhi
Saket Anand , IIIT-Delhi


Abstract: The growing demand for robust scene understanding in mobile robotics and autonomous driving has highlighted the importance of integrating multiple sensing modalities. By combining data from diverse sensors like cameras and LIDARs, fusion techniques can overcome the limitations of individual sensors, enabling a more complete and accurate perception of the environment. We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation that supports critical decision-making processes in autonomous driving. We present a Sensor-Agnostic Graph-Aware Kalman Filter [3], the first online state estimation technique designed to fuse multi-modal graphs derived from noisy multi-sensor data. The estimated graph-based state representations serve as a foundation for advanced applications like Multi-Object Tracking (MOT), offering a comprehensive framework for enhancing the situational awareness and safety of autonomous systems. We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets (nuScenes). Our results showcase an improvement in MOTA and a reduction in estimated position errors (MOTP) and identity switches (IDS) for tracked objects using the SAGA-KF. Furthermore, we highlight the capability of such a framework to develop methods that can leverage heterogeneous information (like semantic objects and geometric structures) from various sensing modalities, enabling a more holistic approach to scene understanding and enhancing the safety and effectiveness of autonomous systems.

180: Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models

Sharat Agarwal , IIIT-Delhi


Abstract: Objects, in the real world, rarely occur in isolation and exhibit typical arrangements governed by their independent utility, and their expected interaction with humans and other objects in the context. For example, a chair is expected near a table, and a computer is expected on top. Humans use this spatial context and relative placement as an important cue for visual recognition in case of ambiguities. Similar to human's, DNN's exploit contextual information from data to learn representations. Our research focuses on harnessing the contextual aspects of visual data to optimize data annotation and enhance the training of deep networks. Our contributions can be summarized as follows: (1) We introduce the notion of contextual diversity for active learning CDAL and show its applicability in three different visual tasks semantic segmentation, object detection and image classification, (2) We propose a data repair algorithm to curate contextually fair data to reduce model bias, enabling the model to detect objects out of their obvious context, (3) We propose Class-based annotation, where contextually relevant classes are selected that are complementary for model training under domain shift. Understanding the importance of well-curated data, we also emphasize the necessity of involving humans in the loop to achieve accurate annotations and to develop novel interaction strategies that allow humans to serve as fact-checkers. In line with this we are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads. For large-scale annotation, we are employing a strategic combination of human expertise and zero-shot models, while also integrating human input at various stages for continuous feedback.

200: Poze: Sports Technique Feedback under Data Constraints

Agamdeep Singh , IISER Bhopal
Sujit PB , IISER Bhopal
Mayank Vatsa , IIT Jodhpur
Vaibhav Kumar , IISER Bhopal


Abstract: Access to expert coaching is essential for developing technique in sports, yet economic barriers often place it out of reach for many enthusiasts. To bridge this gap, we introduce Poze, an innovative video processing framework that provides feedback on human motion, emulating the insights of a professional coach. Poze combines pose estimation with sequence comparison and is optimized to function effectively with minimal data. Poze surpasses state-of-the-art vision-language models in video question-answering frameworks, achieving 70% and 196% increase in accuracy over GPT4V and LLaVAv1.6 7b, respectively.

179: AI-Based Integrated Framework For Vehicle Perception Applications Using Onboard Sensor Data

Dhiraj Choudhary Dommalapati , NITK-Surathkal
Hemanth Kumar M , NITK-Surathkal
Sowmya Kamath S , NITK-Surathkal


Abstract: To Be Added

181: Advanced 3D Object Classification Leveraging Dynamic Features in POINTNET++

Siva Sankari K , Sri Sairam Institute of Technology
Sathya Bama B , Thiagarajar College Of Engineering


Abstract: To be added.

182: Enhanced Panoramic Dental Radiography Using Specialized Kernels And Advanced YOLO Models For Improved Tissue Visualization And Pathology Detection

Vaishali V , Thiagarajar College of Engineering
Md Mansoor , Roomi


Abstract: To Be Added

183: AI-Driven Automated Milk Quality Inspection System with Hyperspectral Imaging

Padmasri P , Thiagarajar College of Engineering
Sathya Bama B , Thiagarajar College Of Engineering


Abstract: To Be Added

184: Food Colorant Detection in Turmeric Powder using Hyperspectral Imaging System

John Shiny J , Velammal College of Engineering and Technology
Sathya Bama B , Thiagarajar College Of Engineering


Abstract: To Be Added

177: Advanced Computational Framework for Non-Invasive Detection and Characterization of Amniotic Fluid Dynamics Using Hybrid Deep Learning and Fluid-Structure Interaction Models

Tamilselvi Rajendran , Sethu Institute of Technology


Abstract: To Be Added

205: Brain Tissue Segmentation and Analysis of Structural Connectivity using Deep Learning Techniques

Puranam Revanth Kumar , Malla Reddy University, Hyderabad
Rajesh Kumar Jha , ICFAI University


Abstract: To Be Added

222: PulmoWave : Pulmonary Disease Detection Using mmWave Radar and Sound Signals: A Multimodal Approach

Sagnik Ghosh , IIT Kharagpur


Abstract: To Be Added

Young Faculty Symposium

185: Human height estimation using AI-assisted computer vision for intelligent video surveillance system

Iyshwarya Ratthi K , Thiagarajar College of Engineering
Yogameena Balasubramanian , NITTTR
Saravana Perumal S , NITTTR


Abstract: In urban areas, technological advancements have led to an increased focus on height as a critical human characteristic for surveillance purposes. Face recognition often encounters challenges due to occlusion and masks, necessitating the use of height, build, and torso. Accurately estimating human height in surveillance scenarios is complex due to camera calibration, posture variations, and movement patterns. This research introduces a novel human height estimation method for surveillance systems, along with a dedicated dataset. The process begins with camera calibration to rectify lens distortions. A deep learning-based YOLOv7-Occlusion Aware (YOLOv7- OA) target detection technique is employed to precisely locate individuals within the frame. The study assesses the impact of camera height and deflection angle on height estimation across different areas of the field of vision (FOV). The proposed method yields a mean absolute error of 0.02 cm to 0.8 cm across various FOV zones, surpassing the previous 1.39 cm benchmark findings.

217: In the Era of Prompt Learning with Vision-Language Models

Ankit Jha , INRIA, Grenoble, France


Abstract: Large-scale foundation models like CLIP have shown strong zero-shot generalization but struggle with domain shifts, limiting their adaptability. In our work, we introduce \textsc{StyLIP}, a novel domain-agnostic prompt learning strategy for Domain Generalization (DG). StyLIP disentangles visual style and content in CLIP`s vision encoder by using style projectors to learn domain-specific prompt tokens and combining them with content features. Trained contrastively, this approach enables seamless adaptation across domains, outperforming state-of-the-art methods on multiple DG benchmarks. Additionally, we propose AD-CLIP for unsupervised domain adaptation (DA), leveraging CLIP`s frozen vision backbone to learn domain-invariant prompts through image style and content features. By aligning domains in embedding space with entropy minimization, AD-CLIP effectively handles domain shifts, even when only target domain samples are available. Lastly, we outline future work on class discovery using prompt learning for semantic segmentation in remote sensing, focusing on identifying novel or rare classes in unstructured environments. This paves the way for more adaptive and generalizable models in complex, real-world scenarios.

221: Post-disaster Building Assessments with Limited Data

Vedhus Hoskere , University of Houston
Subin Varghese , University of Houston
Deepank Singh , University of Houston


Abstract: To Be Added

Industry Research Symposium

169: Comparison of YOLOV5 and YOLOV8 on Pothole Detection

Ashwin Raj Lendalay , HITAM
Rupesh Pabba , HITAM
KARTHIK R U , HITAM
Mohd Afnan Ahmed , HITAM
Sudheer Reddy Katta , HITAM
Nihal Yalla , HITAM
Siva Prasad Kowdodi , HITAM


Abstract: To Be Added.

212: Streamlining Video Analysis for Efficient Violence Detection

Gourang Pathak , Vehant Technologies
Sannidhya Rawat , Vehant Technologies
Abhay Kumar , Vehant Technologies
Shikha Gupta , Vehant Technologies


Abstract: To Be Added.

220: Post-disaster Building Assessments with Limited Data

Vedhus Hoskere , University of Houston
Subin Varghese , University of Houston
Deepank Singh , University of Houston


Abstract: To Be Added