Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

⚠️ 本文最后更新于2023年12月03日，已经过了746天没有更新，若内容或图片失效，请留言反馈

> [__Kim, Sanghwan, et al. "Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.__](https://openaccess.thecvf.com/content/CVPR2023/papers/Kim_Achieving_a_Better_Stability-Plasticity_Trade-Off_via_Auxiliary_Networks_in_Continual_CVPR_2023_paper.pdf)

> [__code available__](https://github.com/kim-sanghwan/ANCL)

_论文中提到有多种方式定义持续学习的问题，分为3类，**任务增量学习（TIL）、领域增量学习（DIL）和类增量学习（CIL）**。在TIL中，模型被告知需要解决的任务，在训练过程和测试阶段中将任务标识提供给模型。在DIL中，模型只需要解决一个手头的任务，而不需要任务标识。在CIL中，模型应该解决任务本身并推断任务身份。由于该模型应该区分迄今为止已经看到的所有类，因此它通常被认为是最难的持续学习场景。论文中主要探讨了TIL和CIL。
论文中谈到目前的持续学习方法可以大致分为5种，**权重正则化，知识蒸馏，记忆重放，偏差校正和动态结构。也提到了一种使用辅助网络或者单独在当前数据集上训练的额外模块。**
本篇论文主要提出了持续学习是在网络稳定性和可塑性之间作平衡，因此构建了一个辅助网络持续学习（ANCL）的框架，它可以自然地将辅助网络纳入各种持续学习方法作为一个插件方法。
_
___
### _Contributions_
_1.propose the framework of Auxiliary Network Continual Learning (ANCL) that can naturally incorporate the auxiliary network into a variety of CL approaches as a plug-in method。
2.ANCL outperforms existing baseline in cifar100 and Tiny ImageNet。
3.perform three analyses to investigate the stability-plasticity trade-off within ANCL :Weight Distance, Centered Kernel Alignment, and Mean Accuracy Landscape.
_
___
### _1. Various Methods for Continual Learning_
_1.**Weight Regularization Method**: A standard way to alleviate catastrophic forgetting is to include a regularization term which binds the dynamics of each network’s parameter to the corresponding parameter of the old network.
2.**Knowledge Distillation Method**: A separate line of work adopts knowledge distillation which was originally designed to train a more compact student network from a larger teacher network. In this way, the main network can emulate the activation or logit of the previous (or old) network while learning a new task.
3.**Memory Replay Method**: Unlike the previous methods, replay-based methods keep a part of the previous data (or exemplars) in a memory buffer. Then, a model is trained on the current dataset and the previous exemplars to prevent the forgetting of the previous tasks.
4.**Bias Correction Method**: In memory replay methods, the network is trained on the highly unbalanced dataset composed of the few exemplars from the previous task and fresh new samples from the new ones. As a result, the network is biased towards the data of new tasks, and this can lead to distorted predictions of the model, which is called task-recency bias.
5.**Dynamic Structure Method**: Dynamic structure approaches use masking for each task or expansion of the model to prevent forgetting and increase the model capacity to learn a new task.
_
___
![](https://lingsgz.com/usr/uploads/2023/12/1052131933.png)

### _2. Conventional Continual Learning Method_
_CL freezes and copies the previous continual model that has been trained until t-1. Then the old network regularizes the main training through the regularization strength λ. The loss of CL consists of two parts.the fisrt term denotes a task-specific loss with respect to main network weights θ ∈ RP. the second term represents the regularizer that binds the dynamic of the network parameters θ to the old network parameters θ∗ 1:t−1 ∈ RP.
The original CL approaches mainly focus on retaining the old knowledge obtained from the previous tasks by preventing large updates that would depart significantly from the old weights θ∗ 1:t−1. However, this might harmfully restrict the model’s ability to learn the new knowledge, which will hinder the right balance between stability and plasticity.
![](https://lingsgz.com/usr/uploads/2023/12/517760565.png)
_
### _3. ANCL Method_
_,ANCL keeps two types of network to maintain this balance: 
(1) the auxiliary network θt∗, which is solely optimized on the current task t allowing for forgetting (plasticity).
(2) the old network θ∗1:t−1 that has been sequentially trained until task t − 1 (stability).
Adjusting both regularizers via λ and λa, ANCL is more likely to achieve a better stability-plasticity balance than CL, under proper hyperparameter tuning.
![](https://lingsgz.com/usr/uploads/2023/12/2146706259.png)![](https://lingsgz.com/usr/uploads/2023/12/2997161493.png)
_
___
### _5. Experiments_
Experiments are conducted on CIFAR10 and TinyImageNet, selecting Resnet32 as benchmark model.
![](https://lingsgz.com/usr/uploads/2023/12/3516814253.png)
___

By Lingsgz On 2023年12月1日