UML Chapter 6 | FlowerMouse Blog

type

status

date

slug

summary

📝The VC-Dimension

🖼️ Big Picture

本章的目标是探寻哪些hypothesis class是PAC learnable的（之前的章节中说明了大小为有限的假设类都是PAC可学的，而 the class of all functions (over an infinite size domain) 是不可学的，事实上也有一部分的大小为无限的H也是可学的）

因此我们引入 VC-dimension 的概念

一个例子

注意在书中是证明PAC learnable的，所以是有realizability条件的（因此一定可以把正例和反例用一个threshold完美分隔开）

证明思路

假设这个threshold为

找到两个mass为的区间

然后假设S中正例对应的x最大值和反例对应的x最小值

如果 , 那么可以得到通过ERM算法学到的也一定属于区间中，因此

因此运用反面，套用union bound，就可以得到证明

VC-dimension

Restrict of H to C

也就是将的domain限制在 C 上

Shattering

The restriction of H to C is the set of functions from C to {0, 1} that can be derived from H.

也就是说所有从C到{0，1}的映射是否都能被H中的函数覆盖

推论6.4:

VC-dimension

假设类的VC维定义为 maximal size of a set C ⊂ X that can be shattered by H.

这里的 C 只是找到一个满足条件即可，不需要全部都满足

VC维为无限的假设类一定是PAC不可学的; VC维为有限的 H 保证了可学性。

证明VC-dim的步骤

There exists a set C of size d that is shattered by H.

Every set C of size d + 1 is not shattered by H.

Examples

Hypothesis Class	VC Dimension
threshold function	1
interval	2
axis aligned rectangles	4

Finite class

In the aforementioned examples, VC-Dimension and the Number of Parameters

seems to have some relations, while this doesn’t hold true for some cases.

🌟 The Fundamental Theorem of PAC learning

Let be a hypothesis class of functions from a domain X to {0, 1} and let the loss function be the 0 − 1 loss. Then, the following are equivalent:

has the uniform convergence property.

Any ERM rule is a successful agnostic PAC learner for .

is agnostic PAC learnable.

is PAC learnable.

Any ERM rule is a successful PAC learner for .

has a finite VC-dimension.

Growth function

The growth function measures the maximal “effective” size of on a set of m examples.

Sauer’s Lemma

when m becomes larger than the VC-dimension, the growth function increases polynomially rather than exponentially with m.

证明暂时省略

🤗 Summary

🪄

VC-dimension is finite and specifies the sample complexity required for PAC learning. The theorem also shows that if a problem is at all learnable, then uniform convergence holds and therefore the problem is learnable using the ERM rule.