StructLoRA | Not All Directions Matter

Abstract

Low-Rank Adaptation (LoRA) has become a cornerstone of parameter-efficient fine-tuning (PEFT). Yet, its efficacy is hampered by two fundamental limitations: semantic drift, by treating all update directions with equal importance, and structural incoherence, from adapting layers independently, resulting in suboptimal, uncoordinated updates. To remedy these, we propose StructLoRA, a framework that addresses both limitations through a principled, dual-component design: (1) an Information Bottleneck-guided filter that prunes task-irrelevant directions to mitigate semantic drift, and (2) a lightweight, training-only graph-based coordinator that enforces inter-layer consistency to resolve structural incoherence. Extensive experiments across LLM, VLM, and vision models (including LLaMA, LLaVA, and ViT) demonstrate that StructLoRA consistently establishes a new state-of-the-art, outperforming not only vanilla LoRA but also advanced dynamic rank allocation and sparsity-based methods. Notably, the benefits are particularly pronounced in challenging low-rank and low-data regimes. Crucially, since our proposed modules operate only during training, StructLoRA enhances performance with zero additional inference cost.

Key Results at a Glance

+3.0%

Avg. gain over LoRA on BoolQ
(LLaMA-7B)

+6.2

CIDEr gain on COCO Caption
(LLaVA-1.5-7B)

86.5

New SOTA avg. on GLUE
(RoBERTa-base)

Extra inference cost
(training-only modules)

Method Overview

We identify two fundamental, unaddressed shortcomings in the LoRA paradigm. Semantic drift stems from allocating a limited parameter budget uniformly across all low-rank update directions, assuming each direction is equally important. Structural incoherence arises from adapting each layer independently, disregarding the compositional structure of deep Transformers.

Figure 1. Architectural comparison between LoRA and StructLoRA. Standard LoRA (left) applies uniform low-rank updates. StructLoRA (right) introduces an Information Bottleneck (IB) Filter (Stage 1) and a Graph-based Coordination mechanism (Stage 2). Both operate only during training and are removed at inference, preserving LoRA's zero-latency efficiency.

Stage 1

Information Bottleneck-Guided Directional Filtering

LoRA spreads a small budget over $r$ directions and treats them the same. In practice, only a few directions help predict the label; others carry nuisance variation. We gate the $r$ rank-one directions with a learnable mask $\mathbf{m} \in [0,1]^r$ and form the filtered update:

$\Delta \tilde{W} = A\,\text{diag}(\mathbf{m})\,B$

We learn $\mathbf{m}$ by an Information Bottleneck objective that rewards dependence on labels while penalizing spurious dependence on inputs, effectively raising the signal-to-noise ratio of the update.

Stage 2

Graph-Based Layer Coordination

We view the network as a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ with one node per layer, where the node feature is the flattened filtered update. We connect adjacent layers and add semantic edges between layers with highly aligned gradients.

A shallow GNN with residual connections propagates and refines the filtered updates, encouraging smoother adaptation trajectories across the model's depth. Both modules are discarded at inference, so latency stays identical to vanilla LoRA.

Main Results

Table 1. Main comparison across language, vision, and multimodal benchmarks (~0.5–1% trainable parameters).

Method	Type	BoolQ	PIQA	CIFAR-100	ImageNet-1k	COCO Cap.	VQAv2
Full Fine-tuning	—	82.6	85.3	85.9	78.8	123.5	76.2
LoRA	Reparam.	79.1	82.4	81.5	76.2	116.2	73.5
QLoRA	Reparam.	80.0	83.1	82.7	76.9	119.1	74.2
DoRA	Reparam.	80.6	83.7	83.2	77.3	120.3	75.0
Sensitivity-LoRA	Dynamic Rank	80.9	84.0	83.5	77.5	120.8	75.2
LoRA-Dropout	Sparsity	80.2	83.3	82.5	76.8	118.8	74.5
StructLoRA (Ours)	Filter + Coord.	82.1	84.9	85.1	78.6	122.9	75.9

Table 2. Head-to-head comparison on the GLUE benchmark (RoBERTa-base).

Method	MNLI	SST-2	MRPC	CoLA	QNLI	QQP	RTE	STS-B	Avg.
LoRA	87.3	93.5	87.1	58.8	93.0	90.5	79.4	91.0	85.1
AdaLoRA	87.3	93.6	87.3	59.0	93.1	90.6	79.6	91.2	85.2
Sensitivity-LoRA	87.6	94.6	87.7	60.2	93.6	90.7	81.8	91.3	86.0
StructLoRA (Ours)	88.1	95.0	88.5	61.5	94.1	91.0	82.3	91.5	86.5

Performance in Challenging Regimes

StructLoRA shines brightest where it matters most — under low-rank and low-data conditions.

Table 3. Performance under varying rank budgets.

Rank ($r$)	Params (%)	BoolQ (LLaMA-7B)		CIFAR-100 (ViT-B/16)		COCO Caption (LLaVA)
		LoRA	StructLoRA	LoRA	StructLoRA	LoRA	StructLoRA
2	0.12	75.1	77.4 (+2.3)	78.3	80.1 (+1.8)	111.2	114.3 (+3.1)
4	0.24	77.6	79.9 (+2.3)	79.7	82.2 (+2.5)	113.8	117.0 (+3.2)
8	0.48	79.1	81.3 (+2.2)	81.5	84.1 (+2.6)	116.2	122.4 (+6.2)
16	0.95	80.3	81.7 (+1.4)	82.8	84.3 (+1.5)	118.1	123.6 (+5.5)
32	1.90	81.0	81.9 (+0.9)	83.4	84.5 (+1.1)	119.0	123.9 (+4.9)

Table 4. Performance under limited supervision (few-shot learning), rank $r=8$.

Dataset	Method	10%	25%	50%	100%
BoolQ (LLaMA-7B)	LoRA	68.5	73.2	76.4	79.1
BoolQ (LLaMA-7B)	StructLoRA	71.2 (+2.7)	76.3 (+3.1)	78.9 (+2.5)	81.3 (+2.2)
CIFAR-100 (ViT-B/16)	LoRA	73.6	78.0	80.5	81.5
CIFAR-100 (ViT-B/16)	StructLoRA	76.3 (+2.7)	80.5 (+2.5)	82.4 (+1.9)	84.1 (+2.6)
COCO Caption (LLaVA)	LoRA	100.2	108.3	114.0	116.2
COCO Caption (LLaVA)	StructLoRA	103.7 (+3.5)	112.4 (+4.1)	117.9 (+3.9)	122.4 (+6.2)

Ablation Studies

Both the IB filter and the GNN coordinator are essential. Removing either degrades performance, and removing both collapses StructLoRA back to standard LoRA.

Table 5. Module-wise ablation of StructLoRA.

Setting	BoolQ	$\Delta$	CIFAR-100	$\Delta$	COCO Cap.	$\Delta$
StructLoRA (Full)	81.3	—	84.1	—	122.4	—
w/o IB Filter	79.4	-1.9	81.9	-2.2	117.8	-4.6
w/o GNN Coordination	80.1	-1.2	82.6	-1.5	119.4	-3.0
w/o Both (= LoRA)	79.1	-2.2	81.5	-2.6	116.2	-6.2

Figure 2. Analysis of filtering strategies. Our IB-guided filter consistently outperforms heuristic alternatives (Random Masking and Top-$k$ Norm) across NLP, Vision, and Multimodal tasks, confirming that the magnitude of an update direction is a poor proxy for its semantic relevance.

Visualizing Structural Coherence

Figure 3. Visual attention comparison (Grad-CAM). LoRA (top) produces diffuse attention across background regions, while StructLoRA (bottom) focuses on semantically relevant areas such as the deer's head and the boat's body.

Figure 4. Layer-wise cosine similarity of updates. StructLoRA (left) induces a coherent block-diagonal structure, while LoRA (right) exhibits noisy, fragmented inter-layer patterns — confirming the semantic drift problem that our framework resolves.

Figure 5. Accuracy vs. Sequence Length (LLaMA-13B on BoolQ). StructLoRA's advantage widens under longer contexts.

Figure 6. SVD Coverage vs. Rank. StructLoRA achieves broader singular-value coverage, especially at low ranks.

BibTeX

If you find our work useful, please consider citing:

@article{xiao2026not, title={Not All Directions Matter: Towards Structured and Task-Aware Low-Rank Model Adaptation}, author={Xiao, Xi and Ma, Chenrui and Zhang, Yunbei and Liu, Chen and Wang, Zhuxuanzi and Li, Yanshu and Zhao, Lin and Hu, Guosheng and Wang, Tianyang and Xu, Hao}, journal={arXiv preprint arXiv:2603.14228}, year={2026} }