Layer-Specific Prompt Fusion Discovery via Differentiable Search in Vision Foundation Models

A differentiable search framework discovers which prompt-token fusion operator each transformer layer should use, replacing one-size-fits-all visual prompt injection.

Xi Xiao1, Xingjian Li2, Yunbei Zhang3, Cheng Han4, Tianming Liu5, Tianyang Wang1, Runmin Jiang2, Jihun Hamm3, Xiao Wang6, Min Xu2

1University of Alabama at Birmingham · 2Carnegie Mellon University · 3Tulane University · 4University of Missouri-Kansas City · 5Georgia State University · 6Oak Ridge National Laboratory

Overview of differentiable layer-specific prompt fusion discovery.
Layer-specific prompt fusion treats prompt injection as a searchable design problem, letting each transformer depth select the operator that best matches its representation role.

Abstract

The best prompt fusion rule changes with depth.

Visual prompt tuning often uses a fixed fusion strategy across all transformer layers, even though shallow, middle, and deep layers encode different types of visual information.

This work formulates prompt fusion as a differentiable architecture search problem over operators such as concatenation, addition, affine fusion, and cross-attention. A bilevel optimization procedure discovers a hybrid layer-wise fusion policy that improves transfer while keeping parameter cost low.

Three claims

What this paper changes

01

Fusion is layer dependent.

Early, middle, and late transformer blocks benefit from different prompt-token interactions.

02

Search beats manual fusion.

Differentiable selection finds hybrid fusion paths that outperform the best single fixed operator.

03

Prompt efficiency is preserved.

The searched design reaches strong transfer performance while tuning roughly 0.75% of model parameters.

Method

Differentiable search over prompt fusion operators

Candidate operators

Each layer can choose among concatenation, addition, affine fusion, and cross-attention-style prompt interactions.

Bilevel search

Architecture weights and prompt parameters are optimized on separate splits to discover robust fusion policies.

Discrete deployment

The final layer-wise policy is discretized, then trained as an efficient prompt module on top of the frozen backbone.

Differentiable search over prompt fusion operators
The search space assigns each transformer layer a prompt fusion operator, producing a hybrid policy instead of a fixed VPT-style injection rule.

Fusion Search

argmin over layer-wise fusion choices: validation loss after prompt training

Results

Searched fusion improves transfer across 34 datasets

77.01 Mean VTAB-1k score from the searched prompt fusion policy
91.6 FGVC average while tuning about 0.75% of parameters
+7.58 VTAB gain over VPT-Deep in the reported setting
1.18x Search cost relative to VPT-Deep training with four candidates
Performance comparison for layer-specific prompt fusion discovery.
The searched hybrid policy consistently improves over fixed prompt fusion operators across transfer groups.
Efficiency analysis for layer-specific prompt fusion.
The final discrete architecture keeps inference overhead modest while benefiting from a richer search stage.

Why it matters

Prompt design should be discovered at the layer level.

Frozen vision transformers are not homogeneous stacks; their layers specialize. A single prompt fusion rule therefore constrains adaptation in a way that is easy to overlook.

Layer-specific discovery gives parameter-efficient tuning a principled route to match fusion behavior with representation depth.

Citation

Cite Prompt Fusion Discovery

If our work is helpful to your research, please consider citing this paper. Thank you.

@misc{xiao2026promptfusiondiscovery,
  title        = {Layer-Specific Prompt Fusion Discovery via Differentiable Search in Vision Foundation Models},
  author       = {Xi Xiao and Xingjian Li and Yunbei Zhang and Cheng Han and Tianming Liu and Tianyang Wang and Runmin Jiang and Jihun Hamm and Xiao Wang and Min Xu},
  year         = {2026},
  note         = {Project page: https://xixiaouab.github.io/Prompt-Fusion-Discovery/}
}