Online bias correction for task-free continual learning

⚠️ 本文最后更新于2023年12月03日，已经过了746天没有更新，若内容或图片失效，请留言反馈

> [__Chrysakis, Aristotelis, and Marie-Francine Moens. "Online bias correction for task-free continual learning." ICLR 2023 at OpenReview (2023).__](https://openreview.net/pdf?id=18XzeuYZh_)

> **code not available**

_论文阐述了传统的无任务的持续学习训练范式高估了当前流观测的重要性，造成了持续学习的预测偏差。因此提出了一个新的度量以量化预测偏差，阐明通过适当修改模型最后一层的参数，可以有效地减轻这种偏差。同时设计了OBC以维持无任务持续学习的无偏训练。
论文的相关工作中提到了__持续学习是从非平稳分布产生的数据中逐步聚合知识的过程。__无任务连续学习的情况下，数据以小批量的形式呈现给学习者，而这种设置与数据分布随时间的变化方式是无关的。换句话说，我们假设不知道分布是否是分段平稳的（即，当学习有不同的任务时），或者分布是否随着时间不断变化。__大多数无任务持续学习方法使用的内存可以存储所有观察到的数据实例的一小部分（通常是10%或更少）。存储在内存中的数据实例随后会被回放，以减轻灾难性的遗忘。这个简单的范式，被称为基于回放的持续学习，在无任务设置中惊人地有效。此外，它也得到了神经科学领域的支持，与生物学习如何发生有关。__
_
___
### _Contributions_
*1.Illustrate that __the conventional paradigm of model training in task-free continual learning overweighting the importance of current stream observations__, speculating that __this overweighting might cause the prediction bias of continual learner__.
2.__Propose a novel metric to quantify prediction bias which can be effectively mitigated by appropriately modifying the parameters of only the final layer of the model, after the end of training__.
3.__ propose a novel approach called Online Bias Correction(OBC), which maintains an unbiased model online, throughout the entire duration of learning.__
4.evaluate the performance of OBC extensively and show that it significantly improves a number of task-free continual learning methods, over multiple datasets.
*
___
### _1. Bias Correction in Task-free Continual Learning_
_**Previous approach explicitly designed to correct for prediction biases in task-free continual learning.**
learning a model using conventional experience replay, and after the entire stream has been observed, they replace the final linear layer of the model with a nearest-class-mean (NCM) classifier computed using all data stored in memory. Moreover, they demonstrate that this approach is effective in increasing the final accuracy of the model. However, in many real-world applications, there is a need for a model that learns and performs inference at the same time, and such a model should ideally be unbiased all the time. To achieve this goal, the NCM approach would have to be applied after every update of the model, and since it needs to make a full pass over the memory, it would be computationally very expensive.
_
___
### _2. Nomenclature_
_redefine neural network into two components. denote the last layer named **classifier** as parameterized function $$c(x; \theta_c)$$,also the former set of preceding layers named **feature extractor** as $$g(x; \theta_g)$$. so the entire network could be represented as :
$$
h(x; \theta_h) \triangleq c(g(x; \theta_g); \theta_c), \text{ where } \theta_h \triangleq\{ \theta_g, \theta_c\}.
$$
As the classifier has low learning capacity while the feature extractor has high instead. Thus given enought data, the feature extractor could learn more complax representations than classifier but in low-data scenarios, it is also prone to overfitting.
_
___
### _3. Data-sampling Bias_
*At a high level, the optimization process during task-free continual learning is very simple. At each step t,__the learner uses a small minibatch of b observations from the stream and another minibatch R of equal size sampling from its memory tocalculate gradient perform an update step over the model parameters__.
Unbiased data-sampling requires that all data instances including both past and current ones should be equally important, meaning that __the probability of using any instance in the optimization step should be the same, regardless of whether that instance is in the current or stored in memory__.
However, the experience replay make the new b current data instances have a probability of 1, the past stored data instances have a probability of b/a instead, a means the num of all data instances stored in memory. Therefore, unlike in the unbiased case described above, __new observations are guaranteed to participate in the model update, but an arbitrary memory instance is very unlikely to__. In essence, this is a data-sampling bias that favors current observations over past ones, and, in turn, leads to the predictions of the model being biased towards recent observations.
*

___
### _4. Post-Training Bias Correction_
_Through experiment, it is found that the prediction bias of a task-free continual learner would be reduced if the parameter of the classifier are changed appropiately after the entire stream has been observed. Thus, it provide the evidence that changes on the training way would mitigate the prediction bias.
_
___
### _5. Online Bias Correction_
_Online Bias Correction sperate the training process of the classifier and the feature extractor, considering that they have different performances when dealing with overfitting and constant stream.
Also mentioned, one surrogate classifier is introduced to ensure that the feature extractor is trained in exactly the same way as in experience replay
In short, OBC attempts to capture the benefits of both experience replay (less feature-extractor overfitting), and training the classifier only with memory data (less prediction bias), in a best-of-bothworlds manner.
![](https://lingsgz.com/usr/uploads/2023/12/3377744950.png)
_
___
### _6. Experiments_
![](https://lingsgz.com/usr/uploads/2023/12/2148101541.png)
___

By Lingsgz On 2023年12月1日