5th Workshop on Image/Video/Audio Quality Assessment in Computer Vision, VLM and Diffusion Model

Workshop Date: Mar 7, 2026

Location: TBD

Held in conjunction with WACV2026

Home

5th Workshop on Image/Video/Audio Quality Assessment in Computer Vision, VLM and Diffusion Model

Important Dates/Links:

Description:

Image, video, and audio quality significantly impacts machine learning and computer vision systems, yet remains underexplored by the broader research community. Real-world applications—from streaming services and autonomous vehicles to cashier-less stores and generative AI—critically depend on robust quality assessment and improvement techniques. Despite their importance, most visual learning systems assume high-quality inputs, while in reality, artifacts from capture, compression, transmission, and rendering processes can severely degrade performance and user experience.

This workshop is particularly timely given the explosive growth of generative AI, which introduces new challenges in quality assessment for both inputs and outputs. By bringing together researchers from industry and academia, we aim to systematically investigate how quality issues affect various visual learning tasks and develop innovative assessment and mitigation techniques. Building on the success of our previous workshops at WACV(2022-2025), we expect to stimulate new research directions and attract more talent to this critical field, ultimately improving the robustness and reliability of computer vision applications across industries.

Topics:

This workshop addresses topics related to image/video/audio quality assessment in machine learning, computer vision, VLM, Diffusion Model, and other types of generative AIs. The topics include, but are not limited to:

Schedule (MST)

Mar 7th, 2026, 8:20 AM – 6:00 PM

Time Event Duration
8:20-8:30am Opening Remarks (Host: Joe Liu) 10 mins
8:30-9:30am Keynote
Keynote Speaker: Gérard G. Medioni (Host: Joe Liu)
60 mins
9:30-10:15am Coffee Break 45 mins
10:15-11:45am Oral Long Session I (Host: Yarong Feng) 90 mins
10:15-10:30am Fast 2DGS: Efficient Image Representation with Deep Gaussian Prior (in person) 15 mins
10:30-10:45am REMinD: Balancing Robust Concept Unlearning and Image Quality in Diffusion Models (in person) 15 mins
10:45-11:00am Reason Then Ground: Multilingual Text/Logo Grounding on Movie Posters (in person) 15 mins
11:00-11:15am VideoForge: Efficient Domain Adaptation for Video Generation Through Quality-Driven Rewards and Enhanced LoRA (in person) 15 mins
11:15-11:30am HandSurge: Localized Neural Surgery for Diffusion-Generated Hand Deformity Restoration (in person) 15 mins
11:30-11:45am HiFi-Deblur: High-Frequency Intense Image Deblurring with Frequency-Decoupled U-Net and Discrete Wavelet Transform (in person) 15 mins
11:45am-1:00pm Lunch Break 75 mins
1:00-2:00pm Oral Long Session II (Host: Qipin Chen) 60 mins
1:00-1:15pm Motion Blur Detection and Segmentation from Static Image Artworks (in person) 15 mins
1:15-1:30pm Transforming Video Subjective Testing with Training, Engagement, and Real-Time Feedback (in person) 15 mins
1:30-1:45pm Can You Find the Difference? Visually Identical Image Detection (in person) 15 mins
1:45-2:00pm Efficient Deep Demosaicing with Spatially Downsampled Isotropic Networks (in person) 15 mins
2:00-3:00pm Keynote
Keynote Speaker: Sarah Ostadabbas, "Toward Data-Efficient Dynamically-Aware Visual Intelligence" (Host: Joe Liu)
60 mins
3:00-3:45pm Coffee Break 45 mins
3:45-4:45pm Oral Short Session I (Host: Qipin Chen) 60 mins
3:45-3:52pm Diffuse4D: Completing NeRF-Stereo Depth via Diffusion-Driven Restoration in Dynamic Scenes (in person) 7 mins
3:52-3:59pm Seeing in the Dark: Synthesizing Underexposure for More Robust Underwater Image Augmentation (in person) 7 mins
3:59-4:06pm ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers (in person) 7 mins
4:06-4:13pm Cost Savings from Automatic Quality Assessment of Generated Images (in person) 7 mins
4:13-4:20pm VIBEFACE - Video and Image Biometric Dataset for Evaluation of Faces (in person) 7 mins
4:20-4:27pm We Still See Broken Limbs: Towards Anatomical Realism in GenAI via Human Preference Learning (in person) 7 mins
4:27-4:34pm JetBench: Quality-Aware Benchmarking of Vision Models for Jet Parameter Classification in Heavy-Ion Physics (in person) 7 mins
4:34-4:41pm STEC: A Spatio-Temporal Entropy Coverage Metric for Evaluating Sampled Video Frames (in person) 7 mins
4:41-4:48pm SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models (in person) 7 mins
4:45-5:00pm Closing Remarks (Host: Joe Liu) 15 mins
5:00-6:00pm Poster Session + Online Oral Presentations 60 mins
5:00-5:07pm From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance (online) 7 mins
5:07-5:14pm Device-Robust Spectral Grading and Origin Detection from UV-Vis-NIR Images: Towards Practical Gemstone Quality Assessment (online) 7 mins
5:14-5:21pm Vision Language Models Learn to Assess Images with Specialists (online) 7 mins
5:21-5:28pm When Probe and Gallery are Low Quality: Decreasing Accuracy and Increasing Demographic Disparities in 1:N Identification (online) 7 mins
5:28-5:35pm CARLA-Haze: A Synthetic Benchmark for Outdoor Image Dehazing (online) 7 mins
5:35-5:42pm Quality-Driven and Diversity-Aware Sample Expansion for Robust Marine Obstacle Segmentation (online) 7 mins
5:42-5:49pm Enhancement as Augmentation: Improving Detection in Highly Degraded Underwater Images Through Mixed-Domain Training (online) 7 mins
5:49-5:56pm Image-Specific Adaptation of Transformer Encoders for Compute-Efficient Segmentation (online) 7 mins
5:56-6:03pm YOLO-OSA: A ShuffleAttention-Enhanced YOLO Model for FOD Detection with Comprehensive Benchmarking on MS COCO (online) 7 mins
Zoom Information for virtual presentations: TBD

Keynotes

Keynote Speaker: Sarah Ostadabbas

Keynote Speaker

Title: "Toward Data-Efficient Dynamically-Aware Visual Intelligence"

Abstract: [TBD]

Bio: Professor Ostadabbas is an associate professor in the Electrical and Computer Engineering Department at Northeastern University (NU) in Boston, Massachusetts, USA. She joined NU in 2016 after completing her post-doctoral research at Georgia Tech, following the achievement of her PhD at the University of Texas at Dallas in 2014. At NU, Professor Ostadabbas holds the roles of Director at the Augmented Cognition Laboratory (ACLab), Director of Women in Engineering (WIE), and Co-Director at The Center for Signal Processing, Imaging, Reasoning, and Learning (SPIRAL). Her research focuses on the convergence of computer vision and machine learning, particularly emphasizing representation learning in visual perception problems. In her applied research, she has significantly contributed to the understanding, detection, and prediction of human and animal behaviors through the modeling of visual motion, considering various biomechanical factors. Professor Ostadabbas also extends her work to the Small Data Domain, including applications in medical and military fields, where data collection and labeling are costly and protected by strict privacy laws. Her solutions involve deep learning frameworks that operate effectively with limited labeled training data, incorporate domain knowledge for prior learning and synthetic data augmentation, and enhance the generalization of learning across domains by acquiring invariant representations. Professor Ostadabbas has co-authored over 130 peer-reviewed journal and conference articles and received research awards from prestigious institutions such as the National Science Foundation (NSF), Department of Defense (DoD), Sony, Mathworks, Amazon AWS, Verizon, Oracle, Biogen, and NVIDIA. She has been honored with the NSF CAREER Award (2022), Sony Faculty Innovation Award (2023), was the runner-up for the Oracle Excellence Award (2023), and One of the 120+ Women Spearheading Advances in Visual Tech and AI Recognized by LDV Capital (2024). She served in the organization committees of many workshops in renowned conferences (such as CVPR, ECCV, ICCV, ICIP, ICCASP, BioCAS, CHASE, ICHI) in various roles including Lead/Co-Lead Organizer, Program Chair, Board Member, Publicity Co-Chair, Session Chair, Technical Committee, and Mentor.


Keynote Speaker: Gérard G. Medioni

Keynote Speaker

Title: TBD

Abstract: [TBD]

Bio: Gérard G. Medioni is a computer scientist, author, academic and inventor. He is a vice president and distinguished scientist at Amazon and serves as emeritus professor of Computer Science at the University of Southern California. Medioni has made contributions to computer vision, in particular 3D sensing, surface reconstruction, and object modelling. He has translated his computer vision research into customer-facing inventions and products. He has authored four books, including Emerging Topics in Computer Vision, Multimedia Systems: Algorithms, Standards, and Industry Practices, and A Computational Framework for Segmentation and Grouping, and has published more than 80 journal papers, 200 conference papers, with over 34,000 citations and his h-index is 88. In addition, he holds 123 patents to his name which include Visual tracking in video images in unconstrained environments by exploiting on-the-fly context using supporters and distracters and Depth mapping based on pattern matching and stereoscopic information, along with patents on Just Walk Out technology and Amazon One. Medioni is a Fellow of the Association for the Advancement of Artificial Intelligence, the Institute of Electrical and Electronics Engineers, the International Association for Pattern Recognition, and the National Academy of Inventors. He is also a member of National Academy of Engineering.

Submission Guidelines and Review Process:

Organizers

Organizer 1

Yarong Feng

Amazon

Organizer 3

Joe Liu

Amazon

Organizer 2

Qipin Chen

Amazon

Information

TBD

Contact Us

If you have any questions or inquiries, please contact us at wacv2026-image-quality-workshop@amazon.com.