PTQ4SAM3: Semantics-Preserving Post-Training Quantization for Segment Anything Model 3

Xiao, Xi; Zhang, Yunbei; Zhao, Lin; Wang, Janet; Li, Yanshu; Ma, Chenrui; Wang, Pan; Li, Yuqi; Li, Fuchen; Wang, Tianyang; Wang, Xiao

PTQ4SAM3: Semantics-Preserving Post-Training Quantization for Segment Anything Model 3

PTQ4SAM3 preserves prompt-conditioned semantics during low-bit post-training quantization, keeping SAM3 accurate, compact, and fast across image and video segmentation.

Xi Xiao¹, Yunbei Zhang², Lin Zhao³, Janet Wang², Yanshu Li⁴, Chenrui Ma⁷, Pan Wang⁵, Yuqi Li⁶, Fuchen Li¹, Tianyang Wang¹, Xiao Wang⁷

¹University of Alabama at Birmingham · ²Tulane University · ³Northeastern University · ⁴Brown University · ⁵University of Pittsburgh · ⁶SUNY Buffalo · ⁷Oak Ridge National Laboratory

Paper coming soon Code coming soon GitHub BibTeX

Overview of PTQ4SAM3 for semantics-preserving post-training quantization of SAM3. — PTQ4SAM3 diagnoses semantic concept drift under low-bit quantization and restores prompt-conditioned segmentation behavior through semantic anchors, orthogonal rectification, and temporal alignment.

Abstract

Quantizing SAM3 is hard because semantics drift before masks fail.

Segment Anything Model 3 extends promptable segmentation to image and video settings, but direct post-training quantization can severely distort the semantic pathways that connect prompts, object queries, and masks.

PTQ4SAM3 introduces a semantics-preserving quantization framework tailored to SAM3. It uses semantic-anchor calibration, orthogonal subspace rectification, and temporal distribution alignment to correct concept drift without retraining the full model.

Three claims

What this paper changes

Semantic drift is the bottleneck.

Low-bit perturbations do not merely add numerical noise; they shift the prompt-conditioned semantic representation that SAM3 relies on for object localization.

Calibration should be prompt aware.

Semantic anchors provide task-aligned calibration signals, preserving the relation between visual tokens, prompt embeddings, and mask predictions.

Video needs temporal stability.

Temporal distribution alignment keeps quantized representations consistent across frames, avoiding flicker while retaining compression and speed benefits.

Method

Preserving SAM3 semantics through quantization-aware alignment

Semantic anchors

Representative prompt-object pairs calibrate activation ranges around the semantic subspaces that matter for segmentation.

Orthogonal rectification

Quantization error is decomposed and corrected away from the semantic anchor direction, reducing destructive drift.

Temporal alignment

Video-frame distributions are aligned so the quantized model stays coherent across time as well as across prompts.

Preserving SAM3 semantics through quantization-aware alignment — PTQ4SAM3 turns post-training quantization into a semantic preservation problem: it calibrates around anchor concepts, rectifies orthogonal distortion, and aligns video distributions.

Semantic Preservation Objective

minimize semantic drift while enforcing W4A8 quantization constraints

Results

Near-lossless W4A8 segmentation with real efficiency gains

72.5 RefCOCO Pr@0.5 at W4A8, compared with 42.1 for PTQ4SAM

80.1 SA-V frame mIoU at W4A8 with 95.9% temporal consistency

3.5x Model compression from low-bit post-training quantization

54.9 FPS Reported throughput with 18.2 ms per frame

Qualitative quantization comparison for PTQ4SAM3. — PTQ4SAM3 retains fine object structure under aggressive W4A8 quantization, where prior PTQ methods lose prompt-specific mask details.

Temporal examples and segmentation comparisons for PTQ4SAM3. — Across video frames, temporal distribution alignment reduces flicker and keeps object masks stable.

Why it matters

Compression should not erase prompt-conditioned meaning.

For foundation segmentation models, the expensive parts of inference are also the parts that carry semantics. Treating quantization as uniform numerical approximation can miss the representation drift that actually changes masks.

PTQ4SAM3 keeps the post-training workflow practical while making the calibration signal respect prompts, objects, and temporal consistency.

Citation

Cite PTQ4SAM3

If our work is helpful to your research, please consider citing PTQ4SAM3. Thank you.

@misc{xiao2026ptq4sam3,
  title        = {PTQ4SAM3: Semantics-Preserving Post-Training Quantization for Segment Anything Model 3},
  author       = {Xi Xiao and Yunbei Zhang and Lin Zhao and Janet Wang and Yanshu Li and Chenrui Ma and Pan Wang and Yuqi Li and Fuchen Li and Tianyang Wang and Xiao Wang},
  year         = {2026},
  note         = {Project page: https://xixiaouab.github.io/PTQ4SAM3/}
}