FML Chapter 3 | FlowerMouse Blog

type

status

date

slug

summary

Rademacher Complexity

定义

Empirical Rademacher complexity

其中的就是 ; 随机变量以的概率取 1，以的概率取 -1

这里可以理解为：对原损失函数加上了一个噪声。

Rademacher complexity

Let denote the distribution according to which samples are drawn. For any integer , the Rademacher complexity of is the expectation of the empirical Rademacher complexity over all samples of size drawn according to :

Rademacher 复杂度就是 empirical 的期望。

Bounds

Theorem 3.5 (Rademacher complexity bounds – binary classification) Let be a family of functions taking values in {−1, +1} and let be the distribution over the input space . Then, for any , with probability at least over a sample of size drawn according to , each of the following holds for any :

还有一种概率的表示形式：

Growth Function

定义

Bounds

Corollary 3.9 (Growth function generalization bound) Let be a family of functions taking values in . Then, for any , with probability at least , for any

VC-dimensions

定义

Definition 3.10 (VC-dimension) The VC-dimension of a hypothesis set is the size of the largest set that can be shattered by :

💡

只要存在一个大小为m的样本集能被shattered就可以了，不要求每个大小为m的样本集都能被shattered

Bounds

Corollary 3.19 (VC-dimension generalization bounds) Let be a family of functions taking values in {−1, +1} with VC-dimension . Then, for any > 0, with probability at least , the following holds for all :

Lower Bounds

以上分析的都是泛化误差的上界，对于下界也可以有所分析

realizable case

Theorem 3.20 (Lower bound, realizable case) Let be a hypothesis set with VC-dimension . Then, for any and any learning algorithm , there exist a distribution over and a target function such that

The theorem shows that for any algorithm A, there exists a ‘bad’ distribution over X and a target function f for which the error of the hypothesis returned by A is a constant times with some constant probability. This further demonstrates the key role played by the VC-dimension in learning. The result implies in particular that PAC-learning in the realizable case is not possible when the VC-dimension is infinite.

non-realizable case

Theorem 3.23 (Lower bound, non-realizable case) Let be a hypothesis set with VCdimension . Then, for any and any learning algorithm , there exists a distribution over such that:

Equivalently, for any learning algorithm, the sample complexity verifies

这个定理说明了：在 Non-realizable 的时候，总存在一个坏的分布，使得算法返回的假设的泛化误差都以常数概率存在一个的下界。

同时使得 agnostic PAC learning 成立的最小抽样数至少为，也就是说 VC-dimension 为无穷的假设类都是PAC不可学的。