CS699: Theory of ML

Basic Information

Lecture time: Monday 2:00 pm to 5:20 pm
Lecture place: VPD 106
Instructor: Vatsal Sharan (vsharan). Office hours on Monday from 11am-noon in SAL 220.
TAs: Bhavya Vasudeva (bvasudev). Office hours on Wednesday from 10:30am-noon in SAL 322.
CP & Grader: Sai Anuroop Kesanapalli (kesanapa)
Communication: All inquiries which do not pertain to a specific member of the course staff should be sent via ed Discussion (see below). USC email-ids of all staff members are in parantheses above.
ed Discussion: We will be using ed for all course communications (regarding homework, project, course scheduling, etc). Please feel free to ask/answer any questions about the class on ed. You can post privately on ed to contact the course staff for any reason. Please signup here, and make sure you receive emails from Ed about announcements.
Gradescope: We will use Gradescope for assignment and final project submission.

Course Description and Objectives

This course focuses on the theoretical foundation of machine learning. The focus will be on understanding fundamental questions regarding both computational and statistical aspects of learning, with an overarching goal to answer the core question: what is the complexity of learning, in terms of its computational and statistical requirements? The course will also cover several modern aspects of the theory of maching learning---including memory complexity of learning, deep learning theory, robustness, and fairness.

The hope is that through this course you will learn to think about machine learning in a more rigorous and principled way and have the skills to design provable and practical machine learning algorithms, which also work in the face of various computational and statistical constraints and meet modern desiderata. The course will equip you with the tools required to undertake research in theoretical machine learning, and will shed light on various interesting research frontiers.

Prerequisites

Familiarity with probability, linear algebra, calculus, analysis of algorithms, basic understanding of machine learning. Python programming knowledge will be helpful.

Syllabus and Materials

The following is a tentative schedule. We will post lecture notes and assignments here. Additional related reading for all lectures will be posted on ed discussion after the lecture.

Lecture	Topics	Lecture notes	Homework
1, 08/21	Introduction, PAC learning, finite hypothesis classes, bias-complexity tradeoff	Lecture 1 notes
2, 08/28	Agnostic PAC learning, uniform convergence, concentration inequalities, sub-Gaussian random variables, VC dimension	Lecture 2 notes	HW1
3, 09/11	VC theorem, Rademacher complexity, generalization bounds using Rademacher complexity	Lecture 3 notes
4, 09/18	Properties of Rademacher complexity, linear hypothesis classes, regularization and structural risk minimization, stability and generalization	Lecture 4 notes
5, 09/25	Computational complexity of learning, learning conjunctions efficiently, intractability of learning 3-term DNF, proper vs. improper learning, hardness of improper learning using cryptographic assumptions, hardness of agnostically learning halfspaces	Lecture 5 notes
6, 10/02	Learning with random classification noise (RCN), SQ learning, SQ learning implies learning with RCN, SQ dimension	Lecture 6 notes	HW2
7, 10/09	Learning parities, Boolean function analysis, hardness of learning parities in SQ, Computational-statistical tradeoffs and the planted clique conjecture, memory-sample tradeoffs for learning	Lecture 7 notes
8, 10/16	Convex optimization, convex learning problems and convex surrogates, gradient descent (GD), convergence of GD, variants and properties of GD, SGD and its convergence, learning with SGD	Lecture 8 notes
9, 10/23	Boosting and AdaBoost, kernel methods	Lecture 9 notes
10, 10/30	Online learning, mistake bound model, Littlestone dimension, online learning in the unrealizable case, Weighted Majority algorithm, Online convex optimization	Lecture 10 notes	HW3 (theory)
11, 11/06	Non-convex optimization, challenges of understanding deep learning, depth-size tradeoffs, optimization landscape for deep learning, algorithmic regularization, benign overfitting		HW3 (empirical)
12, 11/13	Algorithmic explorations. See presentation schedule here.
13, 11/20	Algorithmic explorations. See presentation schedule here.
14, 11/27	Algorithmic explorations. See presentation schedule here.

Requirements and Grading

3-4 homeworks worth 40% of the grade. Discussion is allowed and encouraged but everyone should write solutions on their own. Homeworks should be written in Latex and submitted via Gradescope.
The major component will be a course presentation (30%) and course project (25%). An overview of the requirements is given below, detailed instructions will be discussed later. The course presentation and project will be in groups of two students.
- The class presentation is one of the most important components of the course, and will involve explore various algorithmic and foundational aspects of machine learning that we do not get to investigate in sufficient depth in class. For the project, you will be matched to a conceptual question about ML based on your interest. The question could involve further studying some algorithm we saw in class, studying some new algorithm, investigating aspects of ML we did not get to explore (such as robustness, fairness, privacy), investigating phenomenon in deep learning, exploring and understanding datasets or the performance of ML algorithms on these datasets, etc. You will also be expected to provide some independent insight about your topic of study. This could either come from your own empirical analysis or theoretical investigation. You will give a presentation of about 30 minutes based on your work. To help ensure that all presentations are of high quality, presenters will be asked to go over their slides and review their preparation with the course staff about a week before the presentation (this will be worth 10%, the actual in-class presentation will be worth 20%).
- The project project will be 8-9 pages. You have two options for it: (1) You can continue investigating the topic of your presentation, and write a report on the research you did. (2) You can write a report surveying 2-3 papers.
5% of the grade will be based on course attendance and participation.

Resources and related courses

There is no required textbook for this class, but the following books are good supplemental reading for many parts.
- Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David. Availble online here.
- An Introduction to Computational Learning Theory, by Michael J. Kearns and Umesh Vazirani. Available using your USC account here.
This course draws heavily from several other related courses, and the material there could be a good reference. Here are some pointers (there are also many other excellent courses missing in this short list):
- Haipeng Luo's class at USC. [website]
- Akshay Krishnamurthy's class at UMass. [website]
- Tengyu Ma's and Percy Liang's class at Stanford. [Tengyu's course website] [Percy's lecture notes]
- Rocco Servedio's class at Columbia. [website]
- Varun Kanade's class at Oxford. [website]
- Avrim Blum's class at TTIC. [website]
- Matus Telgarsky's class at UIUC. [website]

CSCI 699: Machine Learning Theory