type
status
date
slug
summary
tags
category
icon
password
本章内容理论证明的难度较为硬核,主要理解定理结论和意义。
本章主要对泛化误差 的 bounds 提供理论保障。
理论分析途径有三:
- Rademacher complexity
- Growth Function
- VC-dimentions of
对于第一个 Rademacher 复杂度来说,计算起来相对复杂。而后两个通过组合数学就能计算,在实际问题中更好分析。
在本文以下内容中,默认是二分类 binary classification 问题,这样便于分析
Rademacher Complexity
定义
Empirical Rademacher complexity

其中的 就是 ; 随机变量 以 的概率取 1,以 的概率取 -1
这里可以理解为: 对原损失函数加上了一个噪声。
Rademacher complexity
Let denote the distribution according to which samples are drawn. For any integer , the Rademacher complexity of is the expectation of the empirical Rademacher complexity over all samples of size drawn according to :
Rademacher 复杂度就是 empirical 的期望。
Bounds
Theorem 3.5 (Rademacher complexity bounds – binary classification) Let be a family of functions taking values in {−1, +1} and let be the distribution over the input space . Then, for any , with probability at least over a sample of size drawn according to , each of the following holds for any :
还有一种概率的表示形式:
Growth Function
定义

Bounds
Corollary 3.9 (Growth function generalization bound) Let be a family of functions taking values in . Then, for any , with probability at least , for any
VC-dimensions
定义
Definition 3.10 (VC-dimension) The VC-dimension of a hypothesis set is the size of the largest set that can be shattered by :
只要存在一个大小为m的样本集能被shattered就可以了,不要求每个大小为m的样本集都能被shattered
Bounds
Corollary 3.19 (VC-dimension generalization bounds) Let be a family of functions taking values in {−1, +1} with VC-dimension . Then, for any > 0, with probability at least , the following holds for all :
Lower Bounds
以上分析的都是泛化误差的上界,对于下界也可以有所分析
realizable case
Theorem 3.20 (Lower bound, realizable case) Let be a hypothesis set with VC-dimension . Then, for any and any learning algorithm , there exist a distribution over and a target function such that
The theorem shows that for any algorithm A, there exists a ‘bad’ distribution over X and a target function f for which the error of the hypothesis returned by A is a constant times with some constant probability. This further demonstrates the key role played by the VC-dimension in learning. The result implies in particular that PAC-learning in the realizable case is not possible when the VC-dimension is infinite.
non-realizable case
Theorem 3.23 (Lower bound, non-realizable case) Let be a hypothesis set with VCdimension . Then, for any and any learning algorithm , there exists a distribution over such that:
Equivalently, for any learning algorithm, the sample complexity verifies
这个定理说明了:在 Non-realizable 的时候,总存在一个坏的分布,使得算法返回的假设的泛化误差都以常数概率存在一个 的下界。
同时使得 agnostic PAC learning 成立的最小抽样数 至少为 ,也就是说 VC-dimension 为无穷的假设类都是PAC不可学的。
- Author:FlowerMouse
- URL:https://tangly1024.com/article/fml-chap3
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!