ACL 2026 Main Conference

Not All Directions Matter: Toward Structured and Task-Aware Low-Rank Adaptation

Xi Xiao1, Chenrui Ma2, Yunbei Zhang3, Chen Liu4, Zhuxuanzi Wang1, Yanshu Li5, Lin Zhao6,
Guosheng Hu7, Tianyang Wang1, Hao Xu8
1University of Alabama at Birmingham   2University of Virginia   3Tulane University   4Yale University
5Brown University   6Northeastern University   7University of Bristol   8Harvard University
* Equal contribution    Corresponding authors    Contact: xxiao@uab.edu

Abstract

Low-Rank Adaptation (LoRA) has become a cornerstone of parameter-efficient fine-tuning (PEFT). Yet, its efficacy is hampered by two fundamental limitations: semantic drift, by treating all update directions with equal importance, and structural incoherence, from adapting layers independently, resulting in suboptimal, uncoordinated updates. To remedy these, we propose StructLoRA, a framework that addresses both limitations through a principled, dual-component design: (1) an Information Bottleneck-guided filter that prunes task-irrelevant directions to mitigate semantic drift, and (2) a lightweight, training-only graph-based coordinator that enforces inter-layer consistency to resolve structural incoherence. Extensive experiments across LLM, VLM, and vision models (including LLaMA, LLaVA, and ViT) demonstrate that StructLoRA consistently establishes a new state-of-the-art, outperforming not only vanilla LoRA but also advanced dynamic rank allocation and sparsity-based methods. Notably, the benefits are particularly pronounced in challenging low-rank and low-data regimes. Crucially, since our proposed modules operate only during training, StructLoRA enhances performance with zero additional inference cost.

Key Results at a Glance

+3.0%
Avg. gain over LoRA on BoolQ
(LLaMA-7B)
+6.2
CIDEr gain on COCO Caption
(LLaVA-1.5-7B)
86.5
New SOTA avg. on GLUE
(RoBERTa-base)
0%
Extra inference cost
(training-only modules)

Method Overview

We identify two fundamental, unaddressed shortcomings in the LoRA paradigm. Semantic drift stems from allocating a limited parameter budget uniformly across all low-rank update directions, assuming each direction is equally important. Structural incoherence arises from adapting each layer independently, disregarding the compositional structure of deep Transformers.

StructLoRA architecture comparison
Figure 1. Architectural comparison between LoRA and StructLoRA. Standard LoRA (left) applies uniform low-rank updates. StructLoRA (right) introduces an Information Bottleneck (IB) Filter (Stage 1) and a Graph-based Coordination mechanism (Stage 2). Both operate only during training and are removed at inference, preserving LoRA's zero-latency efficiency.

Stage 1: Information Bottleneck-Guided Directional Filtering

LoRA spreads a small budget over $r$ directions and treats them the same. In practice, only a few directions help predict the label; others carry nuisance variation. We gate the $r$ rank-one directions with a learnable mask $\mathbf{m} \in [0,1]^r$ and form the filtered update:

$\Delta \tilde{W} = A\,\text{diag}(\mathbf{m})\,B$

We learn $\mathbf{m}$ by an Information Bottleneck objective that rewards dependence on labels while penalizing spurious dependence on inputs, effectively raising the signal-to-noise ratio of the update.

Stage 2: Graph-Based Layer Coordination

We view the network as a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with one node per layer, where the node feature is the flattened filtered update. We connect adjacent layers and add semantic edges between layers with highly aligned gradients. A shallow GNN with residual connections propagates and refines the filtered updates, encouraging smoother adaptation trajectories across the model's depth. This can be formalized as Laplacian smoothing that provably reduces inter-layer drift energy.

Crucially, both the IB filter and the GNN coordinator are training-only — they are discarded at inference, so latency stays identical to vanilla LoRA.

Main Results

Table 1. Main comparison across language, vision, and multimodal benchmarks (~0.5–1% trainable parameters).

Method Type BoolQ PIQA CIFAR-100 ImageNet-1k COCO Cap. VQAv2
Full Fine-tuning 82.685.385.978.8123.576.2
LoRAReparam. 79.182.481.576.2116.273.5
QLoRAReparam. 80.083.182.776.9119.174.2
DoRAReparam. 80.683.783.277.3120.375.0
Sensitivity-LoRADynamic Rank 80.984.083.577.5120.875.2
LoRA-DropoutSparsity 80.283.382.576.8118.874.5
StructLoRA (Ours)Filter + Coord. 82.184.985.178.6122.975.9

Table 2. Head-to-head comparison on the GLUE benchmark (RoBERTa-base).

Method MNLISST-2MRPCCoLAQNLIQQPRTESTS-BAvg.
LoRA 87.393.587.158.893.090.579.491.085.1
AdaLoRA 87.393.687.359.093.190.679.691.285.2
Sensitivity-LoRA 87.694.687.760.293.690.781.891.386.0
StructLoRA (Ours) 88.195.088.561.594.191.082.391.586.5

Performance in Challenging Regimes

StructLoRA shines brightest where it matters most — under low-rank and low-data conditions.

Table 3. Performance under varying rank budgets.

Rank ($r$) Params (%) BoolQ (LLaMA-7B) CIFAR-100 (ViT-B/16) COCO Caption (LLaVA)
LoRAStructLoRA LoRAStructLoRA LoRAStructLoRA
20.12 75.177.4 (+2.3) 78.380.1 (+1.8) 111.2114.3 (+3.1)
40.24 77.679.9 (+2.3) 79.782.2 (+2.5) 113.8117.0 (+3.2)
80.48 79.181.3 (+2.2) 81.584.1 (+2.6) 116.2122.4 (+6.2)
160.95 80.381.7 (+1.4) 82.884.3 (+1.5) 118.1123.6 (+5.5)
321.90 81.081.9 (+0.9) 83.484.5 (+1.1) 119.0123.9 (+4.9)

Table 4. Performance under limited supervision (few-shot learning), rank $r=8$.

Dataset Method 10%25%50%100%
BoolQ (LLaMA-7B) LoRA68.573.276.479.1
StructLoRA 71.2 (+2.7) 76.3 (+3.1) 78.9 (+2.5) 81.3 (+2.2)
CIFAR-100 (ViT-B/16) LoRA73.678.080.581.5
StructLoRA 76.3 (+2.7) 80.5 (+2.5) 82.4 (+1.9) 84.1 (+2.6)
COCO Caption (LLaVA) LoRA100.2108.3114.0116.2
StructLoRA 103.7 (+3.5) 112.4 (+4.1) 117.9 (+3.9) 122.4 (+6.2)

Ablation Studies

Both the IB filter and the GNN coordinator are essential. Removing either degrades performance, and removing both collapses StructLoRA back to standard LoRA.

Table 5. Module-wise ablation of StructLoRA.

Setting BoolQ$\Delta$ CIFAR-100$\Delta$ COCO Cap.$\Delta$
StructLoRA (Full) 81.3 84.1 122.4
w/o IB Filter 79.4-1.9 81.9-2.2 117.8-4.6
w/o GNN Coordination 80.1-1.2 82.6-1.5 119.4-3.0
w/o Both (= LoRA) 79.1-2.2 81.5-2.6 116.2-6.2
Filtering strategy comparison
Figure 2. Analysis of filtering strategies. Our IB-guided filter consistently outperforms heuristic alternatives (Random Masking and Top-$k$ Norm) across NLP, Vision, and Multimodal tasks, confirming that the magnitude of an update direction is a poor proxy for its semantic relevance.

Visualizing Structural Coherence

Grad-CAM comparison
Figure 3. Visual attention comparison (Grad-CAM). LoRA (top) produces diffuse attention across background regions, while StructLoRA (bottom) focuses on semantically relevant areas such as the deer's head and the boat's body.
Layer-wise cosine similarity
Figure 4. Layer-wise cosine similarity of updates. StructLoRA (left) induces a coherent block-diagonal structure, while LoRA (right) exhibits noisy, fragmented inter-layer patterns — confirming the semantic drift problem that our framework resolves.
Accuracy vs. Sequence Length
Figure 5. Accuracy vs. Sequence Length (LLaMA-13B on BoolQ). StructLoRA's advantage widens under longer contexts.
SVD Coverage vs. Rank
Figure 6. SVD Coverage vs. Rank. StructLoRA achieves broader singular-value coverage, especially at low ranks.

BibTeX

If you find our work useful, please consider citing:

@article{xiao2026not, title={Not All Directions Matter: Toward Structured and Task-Aware Low-Rank Adaptation}, author={Xiao, Xi and Ma, Chenrui and Zhang, Yunbei and Liu, Chen and Wang, Zhuxuanzi and Li, Yanshu and Zhao, Lin and Hu, Guosheng and Wang, Tianyang and Xu, Hao}, journal={arXiv preprint arXiv:2603.14228}, year={2026} }