CVPR 2024 Workshop on Prompting in Vision

17 June 2024, 9 AM - 5:40 PM

Seattle Convention Center, Seattle WA, USA



Building general-purpose computer vision models is a multifaceted challenge that requires a system capable of understanding and interpreting a wide array of visual problems. Drawing inspiration from the field of NLP, the concept of “prompting” has been identified as a promising method for adapting large vision models to perform various downstream tasks. This adaptation process is streamlined by integrating a prompt during the inference stage.

Prompts can take several forms in the context of computer vision. They can be as straightforward as providing visual examples of the input and the desired output, thereby giving the model a clear reference for what it needs to accomplish. Alternatively, prompts can be more abstract, such as a series of dots, boxes, or scribbles that guide the model's attention or highlight features within an image. Beyond these visual cues, prompts can also include learned tokens or indicators that are associated with particular outputs through the model's training process. Moreover, prompts can be constructed using language-based task descriptions. In this scenario, textual information is used to direct the model's processing of visual data, bridging the gap between visual perception and language understanding.

This workshop aims to provide a platform for pioneers in prompting for vision to share recent advancements, showcase novel techniques and applications, and discuss open research questions about how the strategic use of prompts can unlock new levels of adaptability and performance in computer vision.

Call for Papers

We consider papers that use prompting for computer vision in the following topics:

Important dates and deadlines:

Submission instructions:



Time Event Speaker Content
09:00 am - 09:10 am Opening
09:10 am - 09:40 am Invited talk Alane Suhr LLMs as Agents
09:40 am - 10:10 am Invited talk Bharath Hariharan TBD
10:10 am - 10:40 am Invited talk Ivana Balazevic Towards Effortless Adaptation of Image and Video Models
10:40 am - 11:00 am Coffee break ☕
11:00 am - 11:30 am Invited talk Alexei A. Efros TBD
11:30 am - 11:45 am Oral presentation Folco Bertini Baldassini What Makes Multimodal In-Context Learning Work?
11:45 am - 12:00 pm Oral presentation Hao Chen Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets
12:00 pm - 02:00 pm Lunch break 🥪
02:00 pm - 02:30 pm Invited talk Yong Jae Lee Visual Prompting in Large Multimodal Models
02:30 pm - 03:00 pm Invited talk Phillip Isola A Brief Prehistory of Visual Prompting
03:00 pm - 04:00 pm Poster session 🪧 Poster boards: #36-45
04:00 pm - 04:30 pm Invited talk Xiaolong Wang TBD
04:30 pm - 05:00 pm Invited talk Mike Z. Shou Prompting in Video Understanding and Generation
05:00 pm - 05:30 pm Panel discussion Trevor Darrell
05:30 pm - 05:40 pm Closing

Accepted Papers

Title Author
What Makes Multimodal In-Context Learning Work? Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski
Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Xiang Li, Wei Ye, Jindong Wang, Guosheng Hu, Marios Savvides
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Övgü Özdemir, Erdem Akagündüz
AAPL: Adding Attributes to Prompt Learning Gahyeon Kim, Sohee Kim, Seokju Lee
Prompting Foundational Models for Omni-supervised Instance Segmentation Arnav Mohanty Das, Ritwick Chaudhry, Kaustav Kundu, Davide Modolo
On the low-shot transferability of [V]-Mamba Diganta Misra, Jay Gala, Antonio Orvieto
Low-Rank Few-Shot Adaptation of Vision-Language Models Maxime Zanella, Ismail Ben Ayed
PointPrompt: A Multi-modal Prompting Dataset for Segment Anything Model Jorge Quesada, Mohammad Alotaibi, Mohit Prabhushankar, Ghassan AlRegib
Uncovering the Hidden Cost of Model Compression Diganta Misra, Muawiz Sajjad Chaudhary, Bharat Runwal, Agam Goyal, Pin-Yu Chen



Please contact Kaiyang Zhou and Amir Bar for general inquiries.