CSCI 699: Machine Learning Theory

Basic Information

Course Description and Objectives

This course focuses on the theoretical foundation of machine learning. The focus will be on understanding fundamental questions regarding both computational and statistical aspects of learning, with an overarching goal to answer the core question: what is the complexity of learning, in terms of its computational and statistical requirements? The course will also cover several modern aspects of the theory of maching learning---including memory complexity of learning, deep learning theory, robustness, and fairness.

The hope is that through this course you will learn to think about machine learning in a more rigorous and principled way and have the skills to design provable and practical machine learning algorithms, which also work in the face of various computational and statistical constraints and meet modern desiderata. The course will equip you with the tools required to undertake research in theoretical machine learning, and will shed light on various interesting research frontiers.


Familiarity with probability, linear algebra, calculus, analysis of algorithms, basic understanding of machine learning. Python programming knowledge will be helpful.

Syllabus and Materials

The following is a tentative schedule. We will post lecture notes and assignments here. Additional related reading for all lectures will be posted on ed discussion after the lecture.

Lecture Topics Lecture notes Homework
1, 08/21 Introduction, PAC learning, finite hypothesis classes, bias-complexity tradeoff Lecture 1 notes
2, 08/28 Agnostic PAC learning, uniform convergence, concentration inequalities, sub-Gaussian random variables, VC dimension Lecture 2 notes HW1
3, 09/11 VC theorem, Rademacher complexity, generalization bounds using Rademacher complexity Lecture 3 notes
4, 09/18 Properties of Rademacher complexity, linear hypothesis classes, regularization and structural risk minimization, stability and generalization Lecture 4 notes
5, 09/25 Computational complexity of learning, learning conjunctions efficiently, intractability of learning 3-term DNF, proper vs. improper learning, hardness of improper learning using cryptographic assumptions, hardness of agnostically learning halfspaces Lecture 5 notes
6, 10/02 Learning with random classification noise (RCN), SQ learning, SQ learning implies learning with RCN, SQ dimension Lecture 6 notes HW2
7, 10/09 Learning parities, Boolean function analysis, hardness of learning parities in SQ, Computational-statistical tradeoffs and the planted clique conjecture, memory-sample tradeoffs for learning Lecture 7 notes
8, 10/16 Convex optimization, convex learning problems and convex surrogates, gradient descent (GD), convergence of GD, variants and properties of GD, SGD and its convergence, learning with SGD Lecture 8 notes
9, 10/23 Boosting and AdaBoost, kernel methods Lecture 9 notes
10, 10/30 Online learning, mistake bound model, Littlestone dimension, online learning in the unrealizable case, Weighted Majority algorithm, Online convex optimization Lecture 10 notes HW3 (theory)
11, 11/06 Non-convex optimization, challenges of understanding deep learning, depth-size tradeoffs, optimization landscape for deep learning, algorithmic regularization, benign overfitting HW3 (empirical)
12, 11/13 Algorithmic explorations. See presentation schedule here.
13, 11/20 Algorithmic explorations. See presentation schedule here.
14, 11/27 Algorithmic explorations. See presentation schedule here.

Requirements and Grading

  1. 3-4 homeworks worth 40% of the grade. Discussion is allowed and encouraged but everyone should write solutions on their own. Homeworks should be written in Latex and submitted via Gradescope.
  2. The major component will be a course presentation (30%) and course project (25%). An overview of the requirements is given below, detailed instructions will be discussed later. The course presentation and project will be in groups of two students.
    • The class presentation is one of the most important components of the course, and will involve explore various algorithmic and foundational aspects of machine learning that we do not get to investigate in sufficient depth in class. For the project, you will be matched to a conceptual question about ML based on your interest. The question could involve further studying some algorithm we saw in class, studying some new algorithm, investigating aspects of ML we did not get to explore (such as robustness, fairness, privacy), investigating phenomenon in deep learning, exploring and understanding datasets or the performance of ML algorithms on these datasets, etc. You will also be expected to provide some independent insight about your topic of study. This could either come from your own empirical analysis or theoretical investigation. You will give a presentation of about 30 minutes based on your work. To help ensure that all presentations are of high quality, presenters will be asked to go over their slides and review their preparation with the course staff about a week before the presentation (this will be worth 10%, the actual in-class presentation will be worth 20%).
    • The project project will be 8-9 pages. You have two options for it: (1) You can continue investigating the topic of your presentation, and write a report on the research you did. (2) You can write a report surveying 2-3 papers.
  3. 5% of the grade will be based on course attendance and participation.

Resources and related courses

  1. There is no required textbook for this class, but the following books are good supplemental reading for many parts.
    • Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David. Availble online here.
    • An Introduction to Computational Learning Theory, by Michael J. Kearns and Umesh Vazirani. Available using your USC account here.
  2. This course draws heavily from several other related courses, and the material there could be a good reference. Here are some pointers (there are also many other excellent courses missing in this short list):