⚠️ 本文最后更新于2023年12月03日,已经过了746天没有更新,若内容或图片失效,请留言反馈 > [**Kim, Chris Dongjoo, et al. "Continual learning on noisy data streams via self-purified replay." Proceedings of the IEEE/CVF international conference on computer vision. 2021.**](https://openaccess.thecvf.com/content/ICCV2021/html/Kim_Continual_Learning_on_Noisy_Data_Streams_via_Self-Purified_Replay_ICCV_2021_paper.html "Paper Address") > [**code available**](https://github.com/ecrireme/SPR) >[**related papers**](https://www.connectedpapers.com/main/0f40d7424a28b5923cbafe35294f4faea5b9170e/Continual-Learning-on-Noisy-Data-Streams-via-Self%20Purified-Replay/graph) _这篇论文是关于持续学习和噪声标签数据学习的。噪声标签数据会加剧灾难性遗忘。该论文提出的SPR通过自中心过滤器将delayed buffer中的数据处理成puried buffer。然后基于P和D来对网络进行训练和测试。_ ### _Contributions_ *1.discover __noisy labels exacerbate catastrophic forgetting__, and itis critical to filter out such noise from the input data stream before storing them in the replay buffer. 2.a novel replay-based framework named Self-Purified Replay (SPR), for noisy labeled continual learning. SPR can not only maintain a clean replaybuffer but also effectively mitigate catastrophic forgetting with a fixed parameter size. 3.evaluate our approach on three synthetic noise benchmarks of MNIST , CIFAR-10 , CIFAR-100 and one real noise dataset of __WebVision__ .Empirical results validate that SPR significantly outperforms many combinations of the state-of-the-artcontinual learning and noisy label learning methods. * ___ #### _1.noisy labels exacerbate catastrophic forgetting_ *The empirical results in Figure 1 show that when trained with noisy labels, the model becomes much more prone to catastrophic forgetting [20, . As the noise level increases from 0% to 60%, sharp decreases in accuracy are seen. Surprisingly, the dotted red circle in Figure 1(b) shows that in CIFAR-10 a fatally hastened forgetting occurs no matter the amount of noise. *  ___ #### _2.Approach to Noisy Labeled Continual Learning_ *G1. Reduce forgetting even with noisy labels: The approach needs to mitigate catastrophic forgetting amidst learning from noisy labeled data. G2. Filter clean data: The method should learn representations such that it identifies the noise as anomalies. Moreover, it should enable this from a small amount ofdata since we do not have access to the entire dataset in online continual learning. *  ___ #### _3.Self-Replay_ *The base network addresses G1 via self-supervised replay (Self-Replay)training (Section 3.1). Hence, we circumvent this error via learning only from x (without y) using contrastive self-supervised learning techniques. That is, the framework first focuses on learning general representations via self-supervised learning from all incoming x. Subsequently, the downstream task (i.e., supervised classification) finetunes the representation using only the samples in the purified buffer P. Building on this concept in terms of continual learning leads to Self-Replay, which mitigates forgetting while learning general representations via self supervised replay of the samples in the delayed and purified buffer (D ∪ P). Specifically, we add a projection head g(·) (i.e., a onelayer MLP) on top of the average pooling layer of the basenetwork, and train it using the normalized temperaturescaled cross-entropy loss [12]. For a minibatch from D and P with a batch size of Bd, Bp ∈ N respectively, we applyrandom image transformations (e.g., cropping, color jitter, horizontal flip) to create two correlated views of each sample, referred to as positives. Then, the loss is optimized to attract the features of the positives closer to each other while repelling them from the other samples in the batch, referred to as the negatives. *    ___ #### _4.Self-Centered Filter_ *The goal of the Self-Centered filter is to obtain confidently clean samples; specifically, it assigns the probability of being clean to all the samples in the delayed buffer. __Expert Network__. The expert network is prepared to featurize the samples in the delayed buffer. These features are used to compute the centrality of the samples, which is the yardstick of selecting clean samples. Inspired by the success of self-supervised learning good representations in Self-Replay, the expert network is also trained with the selfsupervision loss in Eq. 1 with only difference that we use the samples in D only (instead of D ∪ P for the base network). __Centrality__. At the core of the Self-Centered filter lies centrality , which is rooted in graph theory to identify the most influential vertices within a graph. We use a variant of the eigenvector centrality , which is grounded on the concept that a link to a highly influential vertex contributes to centrality more than a link to a lesser influential vertex. __Beta Mixture Models__. The centrality quantifies which samples are the most influential (or the cleanest) within the data of identical class labels. However, the identically labeled data contains both clean and noisy labeled samples, in which the noisy ones may deceptively manipulate the centrality score, leading to an indistinct division of the clean and noisy samples’ centrality scores. Hence, we compute the probability of cleanliness per sample via fitting a Beta mixture model (BMM) to the centrality scores. Among the Z = 2 components, we can easily identify the clean component as the one that has the higher c scores (i.e., a larger cluster). Then, the clean posterior p(z = clean|c) defines the probability that centrality c belongs to the clean component, which is used as the probability to enter and exit the purified buffer, P. After the selected samples enters our full purified buffer, the examples with the lowest p(z = clean|c) are sampled out accordingly. __Stochastic Ensemble__.Since our goal is to obtain the most clean samples as possible, we want to further sort out the possibly noisy samples. We achieve this by introducing a stochastic ensemble of BMMs, enabling a more noise robust posterior than the non-stochastic posterior p(z = clean|c) in the previous section.  __Empirical Supports__.Figure 5 shows some empirical evidence where the stochastic ensemble addresses the two issues to achieve a noise robust posterior p(z|Dl). First, a small portion of noisy samples are falsely confident and are consequently assigned a high centrality score. Stochastic ensembling is able to suppress these noisy samples, as indicated in Figure 5, where the mode of p(c|z = noisy) · p(z = noisy) (red curve) is shifted to the left by a noticeable margin. Second, there are some cases where p(c|z = noisy) · p(z = noisy) drops below the p(c|z = clean) · p(z = clean) leading to a high p(z = clean|c) for the noisy instances, indicated with red circles in Figure 5. The stochastic ensemble of differing As can mitigate such problematic cases to drown out the unexpected noise.  * ___ #### _5.Experiments_ *In our evaluation, we compare SPR with other state-of-the-art models in the online task-free continual learning scenario with label noise. We test on three benchmark datasets of MNIST , CIFAR-10 and CIFAR-100 with symmetric and asymmetric random noise, and one largescale dataset of WebVision with real-world noise on the Web. We also empirically analyze Self-Replay and the Self-Centered filter from many aspects. __Experimental Design__. We explicitly ground our experiment setting based on the recent suggestions for robust evaluation in continual learing as follows. (i) __Cross-task resemblance__: Consecutive tasks in MNIST [41], CIFAR-10, CIFAR-100, WebVision are partly correlated to contain neighboring domain concepts. (ii) __Shared output heads__: A single output vector is used for all tasks. (iii) __No test-time task labels__: Our approach does not require explicit task labels during both training and test phase, often coined as taskfree continual learning in. (iv) __More than two tasks__: MNIST, CIFAR-10, CIFAR-100 and WebVision contain five, five, twenty, and seven tasks, respectively. __Baselines__. Since we opt for continual learning from noisy labeled data streams, we design the baselines combining existing state-of-the-art methods from the two domains of continual learning and noisy label learning. We explore the replay-based approaches that can learn in the online task-free setting. We thus choose (i) Conventional Reservoir Sampling (CRS), (ii) Maximally Interfered Retrieval (MIR), (iii) Partitioning Reservoir Sampling (PRS) and (iv) GDumb. For noisy label learning, we select six models to cover many branches of noisy labeled classification. They include (i) SL loss correction, (ii) semi-supervised JoCoR, (iii) sample reweighting L2R, (iv) label repairing Pencil, (v) training dynamic based detection AUM and (vi) cross-validation based INCV. *   ___ #### _6.Conclusion_ We presented the Self-Purified Replay (SPR) framework for noisy labeled continual learning. At the heart of our framework is Self-Replay, which leverages self-supervised learning to mitigate forgetting and erroneous noisy label signals. The Self-Centered filter maintains a purified replay buffer via centrality-based stochastic graph ensembles. Experiments on synthetic and real-world noise showed that our framework can maintain a very pure replay buffer even with highly noisy data streams while significantly outperforming many combinations of noisy label learning and continual learning baselines. Our results shed light on using selfsupervision to solve the problems of continual learning and noisy labels jointly. Specifically, it would be promising to extend SPR to maintain a not only pure but also more diversified purified buffer. ___ By Lingsgz On 2023年11月28日