This course focuses on the theoretical foundation of machine learning. The focus will be on understanding fundamental questions regarding both computational and statistical aspects of learning, with an overarching goal to answer the core question: what is the complexity of learning, in terms of its computational and statistical requirements? The course will also cover several modern aspects of the theory of maching learning---including memory complexity of learning, deep learning theory, robustness, and fairness.
The hope is that through this course you will learn to think about machine learning in a more rigorous and principled way and have the skills to design provable and practical machine learning algorithms, which also work in the face of various computational and statistical constraints and meet modern desiderata. The course will equip you with the tools required to undertake research in theoretical machine learning, and will shed light on various interesting research frontiers.
Lecture | Topics | Lecture notes | Homework |
---|---|---|---|
1, 08/23 | Introduction, PAC learning | Class notes, Scribe notes | |
2, 08/25 | Finite hypothesis classes, Bias-Complexity tradeoff, Agnostic PAC learning | Class notes, Scribe notes | |
3, 08/30 | Uniform convergence, concentration inequalities | Class notes, Scribe notes | |
4, 09/01 | Concentration inequalities, sub-Gaussian random variables | Class notes, Scribe notes | HW1 |
5, 09/08 | VC dimension | Class notes, Scribe notes | |
6, 09/13 | VC theorem, Rademacher complexity | Class notes, Scribe notes | |
7, 09/15 | Generalization bounds using Rademacher complexity, properties of Rademacher complexity, linear hypothesis classes | Class notes, Scribe notes | |
8, 09/20 | Start computational complexity of learning, learning conjunctions efficiently | Class notes, Scribe notes | |
9, 09/22 | Intractability of learning 3-term DNF, Using 3-CNF to avoid intractability, proper vs. improper learning | Class notes, Scribe notes | |
10, 09/27 | Hardness of improper learning using cryptographic assumptions, hardness of agnostically learning halfspaces | Class notes, Scribe notes | |
11, 09/29 | Learning with random classification noise (RCN), SQ learning | Class notes, Scribe notes | HW2 |
12, 10/04 | SQ learning implies learning with RCN, SQ dimension, learning parities | Class notes, Scribe notes | |
13, 10/06 | Hardness of learning parities in SQ, begin boosting | Class notes, Scribe notes | |
14, 10/11 | AdaBoost and boosting, convex optimization, convex learning problems and convex surrogates | Class notes, Scribe notes | |
15, 10/13 | Consistency of convex surrogates, gradient descent (GD), convergence of GD, variants and properties of GD | Class notes, Scribe notes | |
16, 10/18 | SGD and its convergence, learning with SGD, begin online learning, mistake bound model | Class notes, Scribe notes | |
17, 10/20 | Littlestone dimension, online learning in the unrealizable case, Weighted Majority algorithm | Class notes, Scribe notes | |
18, 10/25 | Online convex optimization, Follow-the-Leader, Follow-the-Regularized-Leader, Online gradient descent (OGD) | Class notes, Scribe notes | |
19, 10/27 | Regret of OGD, online perceptron, computational-statistical tradeoffs, planted clique problem | Class notes, Scribe notes | HW3 |
20, 11/01 | Planted clique conjecture, memory-sample tradeoffs for learning. Class presentation begin. See schedule here. | Class notes, Scribe notes | |
21, 11/03 | See presentation schedule for remainder of semester here. |