Background Fades, Foreground Leads: Curriculum-Guided Background Pruning for Efficient Foreground-Centric Collaborative Perception

1Korea Advanced Institute of Science and Technology (KAIST) 2Texas A&M University
ICRA 2026

Abstract

Collaborative perception enhances the reliability and spatial coverage of autonomous vehicles by sharing complementary information across vehicles, offering a promising solution to long-tail scenarios that challenge single-vehicle perception. However, the bandwidth constraints of vehicular networks make transmitting the entire feature map impractical. Recent methods, therefore, adopt a foreground-centric paradigm, transmitting only predicted foreground-region features while discarding the background, which encodes essential context. We propose FadeLead, a foreground-centric framework that overcomes this limitation by learning to encapsulate background context into compact foreground features during training. At the core of our design is a curricular learning strategy that leverages background cues early on but progressively prunes them away, forcing the model to internalize context into foreground representations without transmitting background itself. Extensive experiments on both simulated and real-world benchmarks show that FadeLead outperforms prior methods under different bandwidth settings, underscoring the effectiveness of context-enriched foreground sharing.

Video Presentation

Motivation

Motivation

We revisited a representative foreground-centric method (Where2Comm) and explicitly separated BEV features into: predicted foreground (Pred-FG), ground-truth foreground (GT-FG), and ground-truth background (GT-BG). Two controlled “oracle” experiments revealed a key gap in the prevailing assumption that foreground-only sharing is sufficient:

  • GT-FG only: Even when transmitting perfectly localized object regions (GT-FG), performance remains limited—foreground alone misses inter-object relations and broader scene semantics.
  • GT-BG only: Transmitting only background (masking out all objects) performs surprisingly well, often rivaling near-full feature sharing, indicating that background carries critical contextual cues for disambiguation, robustness under occlusion, and holistic scene understanding.

Therefore: The core problem is not “foreground vs. background,” but that compact foreground features must be enriched with essential background context—without paying the bandwidth cost of transmitting background at inference.

Method

Framework

Our design and intuition

Motivated by the observation that background context is informative but expensive to transmit, we propose FadeLead, a training-time curriculum that transfers contextual knowledge from background to foreground features. Instead of sharing background explicitly, we guide foreground representations to gradually absorb scene-level cues.

  • Curricular background fading: During training, background features are progressively attenuated while full-scene supervision is maintained. This forces the network to rely increasingly on foreground features to explain the scene.
  • Foreground contextualization: As background fades, foreground features learn to encode inter-object relations, spatial layout, and scene semantics that were previously carried by background regions.
  • Inference-time efficiency: At test time, only foreground features are transmitted, achieving foreground-only communication with performance close to full feature sharing.

Results

Research result visualization

Comparison with SOTAs. See more in the paper and provided video.

BibTeX

@article{wu2025background,
  title={Background Fades, Foreground Leads: Curriculum-Guided Background Pruning for Efficient Foreground-Centric Collaborative Perception},
  author={Wu, Yuheng and Gao, Xiangbo and Tau, Quang and Tu, Zhengzhong and Lee, Dongman},
  journal={arXiv preprint arXiv:2510.19250},
  year={2025}
}