CS699: Theory of ML

Basic Information

Lecture time: Monday, Wednesday 10:00 am to 11:50 am
Lecture place: KAP 141
Instructor: Vatsal Sharan
- Office: SAL 220
- Office Hours: By appointment.
- Contact: Fastest response will be via Piazza, but you can also email me at vsharan at usc.edu.
Piazza: We will be using Piazza for all course communications (regarding homework, project, course scheduling, etc). Please feel free to ask/answer any questions about the class on Piazza. You can post privately on Piazza to contact the course staff for any reason. Signup for Piazza here.
Gradescope: We will use Gradescope for assignment and final project submission. Please create an account on Gradescope using your USC ID and join CSCI 699 using entry code KYXX4E.

Course Description and Objectives

This course focuses on the theoretical foundation of machine learning. The focus will be on understanding fundamental questions regarding both computational and statistical aspects of learning, with an overarching goal to answer the core question: what is the complexity of learning, in terms of its computational and statistical requirements? The course will also cover several modern aspects of the theory of maching learning---including memory complexity of learning, deep learning theory, robustness, and fairness.

The hope is that through this course you will learn to think about machine learning in a more rigorous and principled way and have the skills to design provable and practical machine learning algorithms, which also work in the face of various computational and statistical constraints and meet modern desiderata. The course will equip you with the tools required to undertake research in theoretical machine learning, and will shed light on various interesting research frontiers.

Prerequisites

Familiarity with probability, linear algebra, calculus, analysis of algorithms, basic understanding of machine learning.

Syllabus and Materials

We will post lecture notes and assignments here. Additional related reading for all lectures will be posted on Piazza after the lecture.

Lecture	Topics	Lecture notes	Homework
1, 08/23	Introduction, PAC learning	Class notes, Scribe notes
2, 08/25	Finite hypothesis classes, Bias-Complexity tradeoff, Agnostic PAC learning	Class notes, Scribe notes
3, 08/30	Uniform convergence, concentration inequalities	Class notes, Scribe notes
4, 09/01	Concentration inequalities, sub-Gaussian random variables	Class notes, Scribe notes	HW1
5, 09/08	VC dimension	Class notes, Scribe notes
6, 09/13	VC theorem, Rademacher complexity	Class notes, Scribe notes
7, 09/15	Generalization bounds using Rademacher complexity, properties of Rademacher complexity, linear hypothesis classes	Class notes, Scribe notes
8, 09/20	Start computational complexity of learning, learning conjunctions efficiently	Class notes, Scribe notes
9, 09/22	Intractability of learning 3-term DNF, Using 3-CNF to avoid intractability, proper vs. improper learning	Class notes, Scribe notes
10, 09/27	Hardness of improper learning using cryptographic assumptions, hardness of agnostically learning halfspaces	Class notes, Scribe notes
11, 09/29	Learning with random classification noise (RCN), SQ learning	Class notes, Scribe notes	HW2
12, 10/04	SQ learning implies learning with RCN, SQ dimension, learning parities	Class notes, Scribe notes
13, 10/06	Hardness of learning parities in SQ, begin boosting	Class notes, Scribe notes
14, 10/11	AdaBoost and boosting, convex optimization, convex learning problems and convex surrogates	Class notes, Scribe notes
15, 10/13	Consistency of convex surrogates, gradient descent (GD), convergence of GD, variants and properties of GD	Class notes, Scribe notes
16, 10/18	SGD and its convergence, learning with SGD, begin online learning, mistake bound model	Class notes, Scribe notes
17, 10/20	Littlestone dimension, online learning in the unrealizable case, Weighted Majority algorithm	Class notes, Scribe notes
18, 10/25	Online convex optimization, Follow-the-Leader, Follow-the-Regularized-Leader, Online gradient descent (OGD)	Class notes, Scribe notes
19, 10/27	Regret of OGD, online perceptron, computational-statistical tradeoffs, planted clique problem	Class notes, Scribe notes	HW3
20, 11/01	Planted clique conjecture, memory-sample tradeoffs for learning. Class presentation begin. See schedule here.	Class notes, Scribe notes
21, 11/03	See presentation schedule for remainder of semester here.

Requirements and Grading

3-4 homeworks worth 40% of the grade. Discussion is allowed and encouraged but everyone should write solutions on their own. Homeworks should be written in Latex and submitted via Gradescope.
The major component will be a course presentation (30%) and project (25%). An overview of the requirements is given below, detailed instructions will be discussed later.
- The student presentations are one of the most important components of the class, since they will bring you to the forefront of ML theory research. You will be matched to a paper to present in class based on your interest and will be expected to make a 30 minute presentation on the topic. To help ensure all presentations are high quality, presenters will be asked to go over their slides and review their preparation with the course staff a week before the presentation (this will be worth 10%, the actual in-class presentation will be worth 20%).
- For the project, you will be expected to write a 8-9 page report on 1-2 papers. This could be 1) on the paper you presented, plus related reading; or 2) you are free to choose a different paper(s). You can also do research on ML theory as part of your project, please discuss this with me if this is of interest.
5% of the grade will be based on course attendance and participation .

Resources and related courses

There is no required textbook for this class, but the following books are good supplemental reading for many parts.
- Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David. Availble online here.
- An Introduction to Computational Learning Theory, by Michael J. Kearns and Umesh Vazirani. Available using your USC account here.
This course draws heavily from several other related courses, and the material there could be a good reference. Here are some pointers (there are also many other excellent courses missing in this short list):
- Haipeng Luo's class at USC. [website]
- Akshay Krishnamurthy's class at UMass. [website]
- Tengyu Ma's and Percy Liang's class at Stanford. [Tengyu's course website] [Percy's lecture notes]
- Rocco Servedio's class at Columbia. [website]
- Varun Kanade's class at Oxford. [website]
- Avrim Blum's class at TTIC. [website]
- Matus Telgarsky's class at UIUC. [website]

CSCI 699: Theory of Machine Learning