FML Chapter 2 | FlowerMouse Blog

type

status

date

slug

summary

📝 The PAC learning framework

Concept

A concept c : X → Y is a mapping from X to Y. Since Y = {0, 1}, we can identify c with the subset of X over which it takes the value 1. Thus, in the following, we equivalently refer to a concept to learn as a mapping from X to {0, 1}, or as a subset of X.

Concept 的个数等于 domain X 的子集的个数，因为一个映射函数可以理解为从 X 中挑出一个全为 positive 1 的子集, 因此在 domain X 上的所有 concept 的个数为 .

Hypothesis

Hypothesis set is a set of possible concepts.

对Distribution 性质的理解可以转化为对选择的约束

Generalization error

相对于对整体分布的真实误差

如果是 stochastic 的情况，应该推广为

Empirical error

也就是在训练集上的误差

如果是 stochastic 的情况，应该推广为

其中

💡

对于一个固定的假设来说，empirical error 对 S 求期望的值（遵循iid假设）等于 true error （这一点在后面的证明中有用到）

PAC-learning

💡

注意 m 的大小必须是多项式级别的，如果是指数级别的会失败，不是PAC可学的。

为了应对stochastic的情景，更一般的是 agnostic PAC learning, 误差的测量是与最优预测器的差值

补充：Bayes error 和 noise

一般将 Bayes classifier 作为最优的分类器，可以证明其泛化误差是理论上最小的。

例子: Learning axis-aligned rectangles （运用了几何性质来证明PAC可学性）

证明 consistent 的 finite 的假设类是 PAC 可学的

这里可以参考 UML 第2章的内容，但是 FML 的证明更加简洁

例子: n个布尔变量的合取式

例子: n个布尔变量的任意表达式（PAC不可学）

例子: k项析取范式

例子: k合取范式

证明 finite 的假设类是 PAC 可学的 (inconsistent)

证明思路：

利用 Concentration Hoeffding’s inequality

得到对固定的假设 h 的一个不等式上界

然后利用 Union bound 结合上式得到对所有假设的通用结论

Let be a finite hypothesis set. Then, for any , with probability at least , the following inequality holds:

这里说明了假设类大小与泛化误差之间的tradeoff:

对于不等式右侧的两项

假设类太大会使得第二项的值增大

假设类太小会使得拟合能力减弱，使得第一项的值增大

Sub-gaussian

🔖

Definition Sub-gaussianity

A random variable is -subgaussian if for all , it holds that

If is -subgaussian, then for any

Suppose that is -subgaussian, then

is -subgaussian for all

Suppose and are independent and -subgaussian and -subgaussian, then

is -subgaussian

The following random variables are subgaussian:

If is Gaussian with mean zero and variance , then is -subgaussian.

If has mean zero and almost surely for , then is -subgaussian.

If has mean zero and almost surely, then is -subgaussian.

🤗 Summary

主要介绍了基本的 PAC learning 的框架，说明了任意有限大小的假设类都是PAC可学的。

📎 参考文章

西瓜书《机器学习》12章

UML 2-5 章

📝 The PAC learning framework

Concept

Hypothesis

Generalization error

Empirical error

PAC-learning

证明 consistent 的 finite 的假设类是 PAC 可学的

证明 finite 的假设类是 PAC 可学的 (inconsistent)

Sub-gaussian

🤗 Summary

📎 参考文章

FlowerMouse

交流频道

加入我们的社群讨论分享

📝 The PAC learning framework

Concept

Hypothesis

Generalization error

Empirical error

PAC-learning

证明 consistent 的 finite 的 假设类是 PAC 可学的

证明 finite 的 假设类是 PAC 可学的 (inconsistent)

Sub-gaussian

🤗 Summary

📎 参考文章

FlowerMouse

交流频道

加入我们的社群讨论分享

证明 consistent 的 finite 的假设类是 PAC 可学的

证明 finite 的假设类是 PAC 可学的 (inconsistent)