Representational continuity for unsupervised continual learning

⚠️ 本文最后更新于2023年12月18日，已经过了730天没有更新，若内容或图片失效，请留言反馈

> [__Madaan, Divyam, et al. "Representational continuity for unsupervised continual learning." arXiv preprint arXiv:2110.06976 (2021).  ICLR2022__](https://openreview.net/pdf?id=9Hrka5PA7LW)

> [__code available__](https://github.com/divyam3897/UCL)

_论文通过严谨的实验观察到无监督表征比监督表征对灾难性遗忘更加健壮。微调任务的顺序可以使得效果超过sota的持续学习模型。通过CKA分析可以得到UCL与SCL在低级层的学习表征大致相同，但是在高级层的学习表征有很大不同，这可能导致了对灾难性遗忘抵御效果的差异。_

### _Contributions_
*1.attempt to bridge the gap between continual learning and representation learning and tackle the two crucial problems of continual learning with unlabelled data and representation learning on a sequence of tasks.
2.we focus on unsupervised continual learning (UCL), where the goal of the continual learner is to learn the representations from a stream of unlabelled data instances without forgetting.
3.**observe that the unsupervised representations are comparatively more robust to catastrophic forgetting across all datasets and simply finetuning on the sequence of tasks can outperform various state-of-the-art continual learning alternatives**. Furthermore, show that UCL generalize better to various out of distribution tasks and outperforms SCL for few-shot training scenarios.
4.provide visualization of the representations and loss landscapes, which show that UCL learns discriminative, human perceptual patterns and achieves a flatter and smoother loss landscape. Furthermore, we propose Lifelong Unsupervised Mixup (LUMP) for UCL, which effectively alleviates catastrophic forgetting and provides better qualitative interpretations.
*
![](https://lingsgz.com/usr/uploads/2023/12/1973683751.png)
___
### _1.Problem assume, Learning protocol and Evaluation matrics_
*__Assume the absence of label supervision during training and focus on unsupervised
continual learning.__
Ragard network as $$X_t\rightarrow R^D$$, classifier as $$R^D\rightarrow y_T$$. In unsupervised learning, there would not be the classifier.
So the aim is to learn the representations of network $$X_t\rightarrow R^D$$on a sequence of tasks while preserving the knowledge of the previous tasks.
**Different from supervised continual learning, define the protocol as follows:
1.train the representations of network through the whole T tasks.
2.use KNN classifier to evaluate the quality of the pre-trained representations.**
![](https://lingsgz.com/usr/uploads/2023/12/767812171.png)
*
___
### _2.Continual representatian learning with sequence tasks_

![](https://lingsgz.com/usr/uploads/2023/12/884933822.png)
>[Siasiam](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Exploring_Simple_Siamese_Representation_Learning_CVPR_2021_paper.pdf)

![](https://lingsgz.com/usr/uploads/2023/12/3186615603.png)
![](https://lingsgz.com/usr/uploads/2023/12/552564389.png)
>[BarlowTwins](https://proceedings.mlr.press/v139/zbontar21a/zbontar21a.pdf)

![](https://lingsgz.com/usr/uploads/2023/12/2351014870.png)
___
### _3.Preserving representational continuity of existing model_
*
The majority of existing learning strategies are not directly applicable to UCL.
**To compare with the regularization-based strategies,extend Synaptic Intelligence (SI) to UCL and consider the online per-synapse consolidation during the entire training trajectory of the unsupervised representations.
For architectural-based strategies, investigate  Progressive Neural Networks (PNN) and learn feature representations progressively using the representations learning frameworks.
**
*

*
In SCL, DER could be used to alleviates catastrophic forgetting by matching the network logits across a sequence of tasks during the optimization trajectory.
![](https://lingsgz.com/usr/uploads/2023/12/505544506.png)
**Instead,utilize the output of the projected output by the backbone network to preserve the knowledge of the past tasks over the entire training trajectory. DER for UCL consists of two terms. The first term learns the representations using Simsiam or BarlowTwins. The second term  minimizes the Euclidean distance between the projected outputs to minimize catastrophic forgetting.**![](https://lingsgz.com/usr/uploads/2023/12/3129053690.png)
*

![](https://lingsgz.com/usr/uploads/2023/12/4108215965.png)
___
### _4.Lifelong unsupervised mixup_
*
The standard Mixup training constructs virtual training examples based on the principle of Vicinal Risk Minimization. . In this work, focus on lifelong self-supervised learning and propose Lifelong Unsupervised Mixup (LUMP) that utilizes mixup for UCL by incorporating the instances stored in the replay-buffer from the previous tasks into the vicinal distribution.![](https://lingsgz.com/usr/uploads/2023/12/1127652621.png)
*
___
### _5.Experiments_
*
__Evaluation on SimSiam and BarlowTwins__
![](https://lingsgz.com/usr/uploads/2023/12/3967042370.png)
*

*
__Evaluation on Few-shot training__
![](https://lingsgz.com/usr/uploads/2023/12/2748224216.png)
*

*
__Evaluation on OOD datasets__
![](https://lingsgz.com/usr/uploads/2023/12/816121886.png)
*

*
__Similarity in feature and parameter space.__
__Visualization of feature space.__
__Loss landscape visualization.__
![](https://lingsgz.com/usr/uploads/2023/12/2393521017.png)
*
___
### _6.Conclusions_
*
__1.Surpassing supervised continual learning.__ __UCL are more robust to catastrophic forgetting than SCL. UCL generalizes better to OOD tasks and achieves stronger performance on few-shot learning tasks.__ LUMP interpolates the unsupervised instances between the current task and past task and obtains higher performance with lower catastrophic forgetting across a wide range of tasks.
*

*
__2.Dissecting the learned representations.__ __By investigating the similarity between the representations, observe that UCL and SCL strategies have high similarities in the lower layers but are dissimilar in the higher layers.__ We also show that __UCL representations learn coherent and discriminative patterns and smoother loss landscape than SCL.__
*

*
__3.Limitations and future work.__ In this work, we do not consider the high-resolution tasks for CL. In follow-up work, we intend to conduct further analysis to understand the behavior of UCL and develop sophisticated methods to continually learn unsupervised representations under various setups, such as class-incremental or task-agnostic CL.
*

By Lingsgz On 2023年12月18日