Prompting in Vision

Overview

Building general-purpose computer vision models is a multifaceted challenge that requires a system capable of understanding and interpreting a wide array of visual problems. Drawing inspiration from the field of NLP, the concept of “prompting” has been identified as a promising method for adapting large vision models to perform various downstream tasks. This adaptation process is streamlined by integrating a prompt during the inference stage.

Prompts can take several forms in the context of computer vision. They can be as straightforward as providing visual examples of the input and the desired output, thereby giving the model a clear reference for what it needs to accomplish. Alternatively, prompts can be more abstract, such as a series of dots, boxes, or scribbles that guide the model's attention or highlight features within an image. Beyond these visual cues, prompts can also include learned tokens or indicators that are associated with particular outputs through the model's training process. Moreover, prompts can be constructed using language-based task descriptions. In this scenario, textual information is used to direct the model's processing of visual data, bridging the gap between visual perception and language understanding.

This workshop aims to provide a platform for pioneers in prompting for vision to share recent advancements, showcase novel techniques and applications, and discuss open research questions about how the strategic use of prompts can unlock new levels of adaptability and performance in computer vision.

Call for Papers

We consider papers that use prompting for computer vision in the following topics:

Prompt engineering/learning for computer vision models
Interpretability and explainability
Robustness and generalization
Ethics and bias in prompting
Few-shot learning/continual learning/transfer learning/domain adaptation/domain generalization
Vision applications (classification, detection, segmentation, etc.)
Multimodal applications (text2image, text2video, chatbots, etc.)

Important dates and deadlines:

Paper submission: ~~Mar 15 '24 09:00 PM UTC~~
Paper decision: ~~Apr 05 '24 08:00 PM UTC~~
Camera ready: ~~Apr 14 '24 11:59 PM PT~~

Submission instructions:

All submissions will be handled electronically via OpenReview.
Submissions should be formatted using the official CVPR 2024 template.
Submissions must adhere to the CVPR style, format, and length restrictions.
Double-blind reviewing: The reviewing process will be double blind so submissions must be anonymized.
Dual submission policy: By registering or submitting a manuscript, the authors acknowledge that it has not been previously published or accepted for publication in substantially similar form in any peer-reviewed venue including journal, conference or workshop, or archival forum.
Accepted papers will be published in conjunction with CVPR 2024 proceedings.
Top-rated papers will be showcased as oral presentations in the workshop.

Schedule

Time

Event

Speaker

Content

09:00 am - 09:10 am

Opening

09:10 am - 09:40 am

Invited talk

Alane Suhr

LLMs as Agents

09:40 am - 10:10 am

Invited talk

Bharath Hariharan

Easter eggs in pre-trained vision models

10:10 am - 10:40 am

Invited talk

Ivana Balazevic

Towards Effortless Adaptation of Image and Video Models

10:40 am - 11:00 am

Coffee break ☕

11:00 am - 11:30 am

Invited talk

Alexei A. Efros

Visual Prompting Then and Now

11:30 am - 11:45 am

Oral presentation

Folco Bertini Baldassini

What Makes Multimodal In-Context Learning Work?

11:45 am - 12:00 pm

Oral presentation

Hao Chen

Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets

12:00 pm - 02:00 pm

Lunch break 🥪

02:00 pm - 02:30 pm

Invited talk

Yong Jae Lee

Visual Prompting in Large Multimodal Models

02:30 pm - 03:00 pm

Invited talk

Phillip Isola

Old Lessons on Legible Prompts

03:00 pm - 04:00 pm

Poster session 🪧

Poster boards: #36-45

04:00 pm - 04:30 pm

Invited talk

Xiaolong Wang

Learning to (Learn at Test Time): Expressive State Representations for LLMs

04:30 pm - 05:00 pm

Invited talk

Mike Z. Shou

Prompting in Video Understanding and Generation

05:00 pm - 05:30 pm

Panel discussion

Trevor Darrell

05:30 pm - 05:40 pm

Closing

Accepted Papers

Title

Author

What Makes Multimodal In-Context Learning Work?

Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski

Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets

Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Xiang Li, Wei Ye, Jindong Wang, Guosheng Hu, Marios Savvides

Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts

Övgü Özdemir, Erdem Akagündüz

AAPL: Adding Attributes to Prompt Learning

Gahyeon Kim, Sohee Kim, Seokju Lee

Prompting Foundational Models for Omni-supervised Instance Segmentation

Arnav Mohanty Das, Ritwick Chaudhry, Kaustav Kundu, Davide Modolo

On the low-shot transferability of [V]-Mamba

Diganta Misra, Jay Gala, Antonio Orvieto

Low-Rank Few-Shot Adaptation of Vision-Language Models

Maxime Zanella, Ismail Ben Ayed

PointPrompt: A Multi-modal Prompting Dataset for Segment Anything Model

Jorge Quesada, Mohammad Alotaibi, Mohit Prabhushankar, Ghassan AlRegib

Uncovering the Hidden Cost of Model Compression

Diganta Misra, Muawiz Sajjad Chaudhary, Bharat Runwal, Agam Goyal, Pin-Yu Chen

CVPR 2024 Workshop on Prompting in Vision

17 June 2024, 9 AM - 5:40 PM

Summit 335-336, Seattle Convention Center, Seattle WA, USA

Overview

Call for Papers

Speakers

Panelists

Schedule

Accepted Papers

Organizers

Contact