type
status
date
slug
summary
tags
category
icon
password
本博客是知名 Learning Theory 教材 ”Understanding Machine Learning: From Theory to Algorithms” 第六章的读书笔记
📝The VC-Dimension
🖼️ Big Picture
本章的目标是探寻哪些hypothesis class是PAC learnable的 (之前的章节中说明了大小为有限的假设类都是PAC可学的,而 the class of all functions (over an infinite size domain) 是不可学的,事实上也有一部分的大小为无限的H也是可学的)
因此我们引入 VC-dimension 的概念
一个例子
注意在书中是证明PAC learnable的,所以是有realizability条件的(因此一定可以把正例和反例用一个threshold完美分隔开)
证明思路
- 假设这个threshold为
- 找到两个mass为 的区间
- 然后假设S中正例对应的x最大值 和反例对应的x最小值
- 如果 , 那么可以得到通过ERM算法学到的 也一定属于区间 中,因此
- 因此运用反面,套用union bound,就可以得到证明
VC-dimension
Restrict of H to C
也就是将 的domain限制在 C 上
Shattering
The restriction of H to C is the set of functions from C to {0, 1} that can be derived from H.
也就是说所有从C到{0,1}的映射是否都能被H中的函数覆盖
推论6.4:
VC-dimension
假设类 的VC维定义为 maximal size of a set C ⊂ X that can be shattered by H.
这里的 C 只是找到一个满足条件即可,不需要全部都满足
VC维为无限的假设类一定是PAC不可学的; VC维为有限的 H 保证了可学性。
证明VC-dim的步骤
- There exists a set C of size d that is shattered by H.
- Every set C of size d + 1 is not shattered by H.
Examples
Hypothesis Class | VC Dimension |
threshold function | 1 |
interval | 2 |
axis aligned rectangles | 4 |
Finite class
In the aforementioned examples, VC-Dimension and the Number of Parameters
seems to have some relations, while this doesn’t hold true for some cases.
🌟 The Fundamental Theorem of PAC learning
Let be a hypothesis class of functions from a domain X to {0, 1} and let the loss function be the 0 − 1 loss. Then, the following are equivalent:
- has the uniform convergence property.
- Any ERM rule is a successful agnostic PAC learner for .
- is agnostic PAC learnable.
- is PAC learnable.
- Any ERM rule is a successful PAC learner for .
- has a finite VC-dimension.
Growth function
The growth function measures the maximal “effective” size of on a set of m examples.
Sauer’s Lemma
when m becomes larger than the VC-dimension, the growth function increases polynomially rather than exponentially with m.
证明暂时省略
🤗 Summary
VC-dimension is finite and specifies the sample complexity required for PAC learning. The theorem also shows that if a problem is at all learnable, then uniform convergence holds and therefore the problem is learnable using the ERM rule.
📎 参考文章
- 《机器学习》周志华
- Author:FlowerMouse
- URL:https://tangly1024.com/article/uml-chap6
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!