CSCI 699: Theory of Machine Learning

Basic Information

Course Description and Objectives

This course focuses on the theoretical foundation of machine learning. The focus will be on understanding fundamental questions regarding both computational and statistical aspects of learning, with an overarching goal to answer the core question: what is the complexity of learning, in terms of its computational and statistical requirements? The course will also cover several modern aspects of the theory of maching learning---including memory complexity of learning, deep learning theory, robustness, and fairness.

The hope is that through this course you will learn to think about machine learning in a more rigorous and principled way and have the skills to design provable and practical machine learning algorithms, which also work in the face of various computational and statistical constraints and meet modern desiderata. The course will equip you with the tools required to undertake research in theoretical machine learning, and will shed light on various interesting research frontiers.

Prerequisites

Familiarity with probability, linear algebra, calculus, analysis of algorithms, basic understanding of machine learning.

Syllabus and Materials

We will post lecture notes and assignments here. Additional related reading for all lectures will be posted on Piazza after the lecture.

Lecture Topics Lecture notes Homework
1, 08/23 Introduction, PAC learning Class notes, Scribe notes
2, 08/25 Finite hypothesis classes, Bias-Complexity tradeoff, Agnostic PAC learning Class notes, Scribe notes
3, 08/30 Uniform convergence, concentration inequalities Class notes, Scribe notes
4, 09/01 Concentration inequalities, sub-Gaussian random variables Class notes, Scribe notes HW1
5, 09/08 VC dimension Class notes, Scribe notes
6, 09/13 VC theorem, Rademacher complexity Class notes, Scribe notes
7, 09/15 Generalization bounds using Rademacher complexity, properties of Rademacher complexity, linear hypothesis classes Class notes, Scribe notes
8, 09/20 Start computational complexity of learning, learning conjunctions efficiently Class notes, Scribe notes
9, 09/22 Intractability of learning 3-term DNF, Using 3-CNF to avoid intractability, proper vs. improper learning Class notes, Scribe notes
10, 09/27 Hardness of improper learning using cryptographic assumptions, hardness of agnostically learning halfspaces Class notes, Scribe notes
11, 09/29 Learning with random classification noise (RCN), SQ learning Class notes, Scribe notes HW2
12, 10/04 SQ learning implies learning with RCN, SQ dimension, learning parities Class notes, Scribe notes
13, 10/06 Hardness of learning parities in SQ, begin boosting Class notes, Scribe notes
14, 10/11 AdaBoost and boosting, convex optimization, convex learning problems and convex surrogates Class notes, Scribe notes
15, 10/13 Consistency of convex surrogates, gradient descent (GD), convergence of GD, variants and properties of GD Class notes, Scribe notes
16, 10/18 SGD and its convergence, learning with SGD, begin online learning, mistake bound model Class notes, Scribe notes
17, 10/20 Littlestone dimension, online learning in the unrealizable case, Weighted Majority algorithm Class notes, Scribe notes
18, 10/25 Online convex optimization, Follow-the-Leader, Follow-the-Regularized-Leader, Online gradient descent (OGD) Class notes, Scribe notes
19, 10/27 Regret of OGD, online perceptron, computational-statistical tradeoffs, planted clique problem Class notes, Scribe notes HW3
20, 11/01 Planted clique conjecture, memory-sample tradeoffs for learning. Class presentation begin. See schedule here. Class notes, Scribe notes
21, 11/03 See presentation schedule for remainder of semester here.

Requirements and Grading

  1. 3-4 homeworks worth 40% of the grade. Discussion is allowed and encouraged but everyone should write solutions on their own. Homeworks should be written in Latex and submitted via Gradescope.
  2. The major component will be a course presentation (30%) and project (25%). An overview of the requirements is given below, detailed instructions will be discussed later.
    • The student presentations are one of the most important components of the class, since they will bring you to the forefront of ML theory research. You will be matched to a paper to present in class based on your interest and will be expected to make a 30 minute presentation on the topic. To help ensure all presentations are high quality, presenters will be asked to go over their slides and review their preparation with the course staff a week before the presentation (this will be worth 10%, the actual in-class presentation will be worth 20%).
    • For the project, you will be expected to write a 8-9 page report on 1-2 papers. This could be 1) on the paper you presented, plus related reading; or 2) you are free to choose a different paper(s). You can also do research on ML theory as part of your project, please discuss this with me if this is of interest.
  3. 5% of the grade will be based on course attendance and participation .

Resources and related courses

  1. There is no required textbook for this class, but the following books are good supplemental reading for many parts.
    • Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David. Availble online here.
    • An Introduction to Computational Learning Theory, by Michael J. Kearns and Umesh Vazirani. Available using your USC account here.
  2. This course draws heavily from several other related courses, and the material there could be a good reference. Here are some pointers (there are also many other excellent courses missing in this short list):