This course focuses on the theoretical foundation of machine learning. The focus will be on understanding fundamental questions regarding both computational and statistical aspects of learning, with an overarching goal to answer the core question: what is the complexity of learning, in terms of its computational and statistical requirements? The course will also cover several modern aspects of the theory of maching learning---including memory complexity of learning, deep learning theory, robustness, and fairness.
The hope is that through this course you will learn to think about machine learning in a more rigorous and principled way and have the skills to design provable and practical machine learning algorithms, which also work in the face of various computational and statistical constraints and meet modern desiderata. The course will equip you with the tools required to undertake research in theoretical machine learning, and will shed light on various interesting research frontiers.
|1, 08/21||Introduction, PAC learning, finite hypothesis classes, bias-complexity tradeoff||Lecture 1 notes|
|2, 08/28||Agnostic PAC learning, uniform convergence, concentration inequalities, sub-Gaussian random variables, VC dimension||Lecture 2 notes||HW1|
|3, 09/11||VC theorem, Rademacher complexity, generalization bounds using Rademacher complexity||Lecture 3 notes|
|4, 09/18||Properties of Rademacher complexity, linear hypothesis classes, regularization and structural risk minimization, stability and generalization||Lecture 4 notes|
|5, 09/25||Computational complexity of learning, learning conjunctions efficiently, intractability of learning 3-term DNF, proper vs. improper learning, hardness of improper learning using cryptographic assumptions, hardness of agnostically learning halfspaces||Lecture 5 notes|
|6, 10/02||Learning with random classification noise (RCN), SQ learning, SQ learning implies learning with RCN, SQ dimension||Lecture 6 notes||HW2|
|7, 10/09||Learning parities, Boolean function analysis, hardness of learning parities in SQ, Computational-statistical tradeoffs and the planted clique conjecture, memory-sample tradeoffs for learning||Lecture 7 notes||8, 10/16||Convex optimization, convex learning problems and convex surrogates, gradient descent (GD), convergence of GD, variants and properties of GD, SGD and its convergence, learning with SGD||Lecture 8 notes|
|9, 10/23||Boosting and AdaBoost, kernel methods||Lecture 9 notes|
|10, 10/30||Online learning, mistake bound model, Littlestone dimension, online learning in the unrealizable case, Weighted Majority algorithm, Online convex optimization||Lecture 10 notes||HW3 (theory)|
|11, 11/06||Non-convex optimization, challenges of understanding deep learning, depth-size tradeoffs, optimization landscape for deep learning, algorithmic regularization, benign overfitting||HW3 (empirical)|
|12, 11/13||Algorithmic explorations. See presentation schedule here.|
|13, 11/20||Algorithmic explorations. See presentation schedule here.|
|14, 11/27||Algorithmic explorations. See presentation schedule here.|