- 
                Discovering Data Structures: Nearest Neighbor Search and Beyond
             - 
                    Omar Salemohamed, Laurent Charlin, Shivam Garg, Vatsal Sharan, Gregory Valiant
             Neurips, 2025
            arXiv-  |  pdf
- 
                 The Rich and the Simple: On the Implicit Bias of Adam and SGD
             - 
                Bhavya Vasudeva, Jung Whan Lee, Vatsal Sharan, Mahdi Soltanolkotabi
             Neurips, 2025
            arXiv-  |  pdf
- 
                 Simultaneous Swap Regret Minimization via KL-Calibration
             - 
                Haipeng Luo, Spandan Senapati, Vatsal Sharan
             Neurips, 2025 (Spotlight)
            arXiv-  |  pdf
- 
                 Improved Bounds for Swap Multicalibration and Swap Omniprediction
             - 
                Haipeng Luo, Spandan Senapati, Vatsal Sharan
             Neurips, 2025 (Spotlight)
            arXiv-  |  pdf
- 
                  On the Inherent Privacy of Zeroth Order Projected Gradient Descent
             - 
                Devansh Gupta, Meisam Razaviyayn, Vatsal Sharan
             - 
            AISTATS, 2025
             OPT 2024, Neurips  (Oral)  
            arXiv-  |  pdf
- 
                  Proper Learnability and the Role of Unlabeled Data
             - 
                Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng
             ALT, 2025
            arXiv-  |  pdf
- 
                Transformers Learn Low Sensitivity Functions: Investigations and Implications
             - 
                Bhavya Vasudeva, Deqing Fu, Tianyi Zhou, Elliott Kau, Youqi Huang, Vatsal Sharan
             ICLR, 2025
            arXiv-  |  pdf
- 
               When is Multicalibration Post-Processing Necessary?
             - 
                Dutch Hansen, Siddartha Devic, Preetum Nakkiran, Vatsal Sharan
             Neurips, 2024
            arXiv-  |  pdf
-   - 
                Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
               - 
                Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan
             Neurips, 2024
             SoCal NLP Symposium 2023 Best Paper Award 
            arXiv-  |  pdf
- 
                 Pre-trained Large Language Models Use Fourier Features to Compute Addition
             - 
                Tianyi Zhou, Deqing Fu, Vatsal Sharan, Robin Jia
             Neurips, 2024
            arXiv-  |  pdf
- 
                 Optimal Multiclass U-Calibration Error and Beyond
             - 
                Haipeng Luo, Spandan Senapati, Vatsal Sharan
             Neurips, 2024
            arXiv-  |  pdf
- 
                Transductive Sample Complexities Are Compact
             - 
                Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng
             Neurips, 2024
            arXiv-  |  pdf
- On the Statistical Complexity of Sample Amplification
           - 
          Brian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant
           Annals of Statistics, 2024
          arXiv-  |  pdf
- 
                 Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness
             - 
                Bhavya Vasudeva, Kameron Shahabi, Vatsal Sharan
             TMLR, 2024
            arXiv-  |  pdf
- 
                Stability and Multigroup Fairness in Ranking with Uncertain Predictions
             - 
                Siddartha Devic, Aleksandra Korolova, David Kempe, Vatsal Sharan
             ICML, 2024
            Non-archival at FORC, 2024
            arXiv-  |  pdf-  |  video
- 
                  Regularization and Optimal Multiclass Learning
             - 
                Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng
             COLT, 2024
            arXiv-  |  pdf
- 
                Open Problem: Can Local Regularization Learn All Multiclass Problems?
             - 
                Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng
             Open Problem @ COLT, 2024
             pdf
- Fairness in Matching under Uncertainty
             - 
                Siddartha Devic, David Kempe, Vatsal Sharan, Aleksandra Korolova
             ICML, 2023
            Non-archival at EAAMO, 2023
            arXiv-  |  pdf
- KL Divergence Estimation with Multi-group Attribution
             - 
                Parikshit Gopalan, Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi Wieder
             arXiv, 2022
            arXiv-  |  pdf
- NeuroSketch: A Neural Network Method for Fast and Approximate Evaluation of Range Aggregate Queries
             - 
                Sepanta Zeighami, Vatsal Sharan, Cyrus Shahabi
             SIGMOD, 2023
            arXiv-  |  pdf
- Efficient Convex Optimization Requires Superlinear Memory
             - 
            Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant
             COLT, 2022
             Best Paper Award  
            Invited to IJCAI 2023 Sister Conference Notable Paper Track
            JACM, 2024
            arXiv-  |  pdf
- Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales
             - 
            Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan
             COLT, 2022
            arXiv-  |  pdf
- Multicalibrated Partitions for Importance Weights
             - 
            Parikshit Gopalan, Omer Reingold, Vatsal Sharan, Udi Wieder
             ALT, 2022
            arXiv-  |  pdf
- Omnipredictors
             - 
            Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder
             ITCS, 2022
            arXiv-  |  pdf
- One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
                Learning - 
                Atish Agarwala, Abhimanyu Das, Brendan Juba, Rina Panigrahy, Vatsal Sharan, Xin Wang, Qiuyi Zhang ICLR, 2021
                abstract-  |  arXiv-  |  pdf-  |  video
               Can deep learning solve multiple, very different tasks simultaneously? We investigate this question in this work, and do a systematic study of how the properties of the underlying tasks affect the ability of a single neural network to learn them jointly. We present theoretical and empirical findings that neural networks are capable of learning multiple tasks when these tasks are naturally clustered and well-separated. We also show that multi-task learning can be provably hard---even though the individual tasks are easy to learn---in settings where such a natural separation does not exist, and validate this through experiments.
            
             
            
	    	- Sample Amplification: Increasing Dataset Size even when Learning is Impossible 
			   Is learning a distribution always necessary for generating new samples from the distribution? We introduce the problem of "sample amplification": given n independent draws from a distribution, D , to what extent is it possible to output a set of m > n   datapoints that are indistinguishable from m  i.i.d. draws from D ? Curiously, we show that nontrivial amplification is often possible in the regime where n  is too small to learn D  to any nontrivial accuracy.
			
            
            
                Jump to 50:00 (or topic #27) in this 
video.
            
- PIDForest: Anomaly Detection via Partial Identification 
                We propose a definition of an anomaly that captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values: we call this partial identification. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures for the minimum density of data points over all subcubes containing the point. We present PIDForest: a random forest based algorithm that finds anomalies based on this definition, and show that it performs favorably in comparison to several popular anomaly detection methods, across a broad range of benchmarks.
            
            
            
                Jump to 35:30 (or topic #47) in this 
video.
            
- Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data 
                What learning algorithms can be run directly on compressively-sensed data? If we compress a high dimensional matrix via a random projection, then what is the relationship between the factors of the original matrix and the factors of the compressed matrix? While it is well-known that random projections preserve a number of geometric properties of a dataset, this work shows that random projections can also preserve certain solutions to non-convex, NP-Hard problems like non-negative matrix factorization---both in theory and in practice.
            
            
            
                Jump to 53:20 in this 
video.
            
- Memory-Sample Tradeoffs for Linear Regression with Small Error 
                What is the role of memory in continuous optimization and learning? Are there inherent trade-offs between the available memory and the data requirement? Is it possible to achieve the sample complexity of second-order optimization methods with significantly less memory? We raise these questions, and show the first nontrivial lower bounds for linear regression with super-linear memory: we show that there is a gap between the sample complexity of algorithms with quadratic memory and that of any algorithm with sub-quadratic memory. 
            
            
            - Recovery Guarantees for Quadratic Tensors with Limited Observations - 
                    Hongyang Zhang, Vatsal Sharan, Moses Charikar and Yingyu Liang AISTATS, 2019
                abstract-  |  pdf-  |  arXiv
               We consider the tensor completion problem of predicting the missing entries of a tensor. The commonly used CP model has a triple product form, but an alternate family of quadratic models which are the sum of pairwise products instead of a triple product have emerged from applications such as recommendation systems. Non-convex methods are the method of choice for learning quadratic models, and this work studies their sample complexity and error guarantees via theoretical results and experiments on real and synthetic data.
            
            
 
            - Efficient Anomaly Detection via Matrix Sketching - 
                    Vatsal Sharan, Parikshit Gopalan, Udi Wieder NeurIPS, 2018
                abstract-  |  pdf-  |  arXiv
                PCA based anomaly scores are commonly used for finding anomalies in high-dimensional data. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. Using matrix sketching tools and new matrix perturbation inequalities, we give the first streaming algorithms for computing these scores that use space that is linear or sublinear in the dimension.
            
            
            - A Spectral View of Adversarially Robust Features - 
                    Shivam Garg, Vatsal Sharan, Brian Zhang, Gregory Valiant NeurIPS, 2018
                Spotlight presentation
                abstract-  |  pdf-  |  arXiv
                Despite a surge of recent effort, there is still much that we do not understand about why models are vulnerable to adversarial perturbations, or how to reduce this vulnerability. In this work, we introduce the simpler, intermediate task of learning a set of "robust features"---features which are stable to adversarial perturbations. These features can subsequently be used as inputs to a classifier, which will then inherit the robustness properties. We propose one approach to constructing robust features, which leverages a tight connection between robustness and spectral properties of the neighbourhood graph of the data.
            
            
            - Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries - 
                    Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, Peter Bailis VLDB, 2018
                abstract-  |  pdf-  |  arXiv-  |  code
                We propose a compact and efficiently mergeable sketch for answering quantile queries on a dataset. This data structure, which we refer to as the moments sketch, operates with a small memory footprint (200 bytes) and achieves computationally efficient (50ns) merges by tracking only a set of summary statistics, notably the sample moments.
            
            
            - Prediction with a Short Memory 
                When is it necessary to remember significant information from the distant past to make good predictions about the future in sequential prediction tasks? How do we consolidate and reference memories about the past in order to predict the future? This work studies these questions, and the findings are perhaps counterintuitive: we show that a simple (Markov) model which bases its predictions on only the most recent observations and the statistics of short windows, can  achieve small error on average with respect to a large class of sequences, such as those generated by Hidden Markov Models (HMMs). This shows that long-term memory may not be necessary for making good predictions on average, even on sequences generated by complex processes that have long-range dependencies.
            
            
            - Sketching Linear Classifiers over Data Streams - 
                    Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant SIGMOD, 2018
                abstract-  |  pdf-  |  arXiv-  |  code
                We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the classifier. The Weight-Median Sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We establishes recovery guarantees for our sketch and demonstrate empirical improvements in memory-accuracy trade-offs over alternative memory-budgeted methods. 
            
             
            
            - Learning Overcomplete HMMs - 
                    Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant NeurIPS, 2017
            abstract-  |  pdf-  |  arXiv
                We study the problem of learning overcomplete HMMs---those that have many hidden states but a small output alphabet. Despite having significant practical importance, such HMMs are poorly understood with no known positive or negative results for efficient learning. In this paper, we present several new results---both positive and negative---which help define the boundaries between the tractable and intractable settings. 
            
            
            - Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use
             
                The popular Alternating Least Squares (ALS) algorithm for tensor decomposition is efficient and easy to implement, but often converges to poor local optima. We propose a modification of the ALS approach that is as efficient as standard ALS, but provably recovers the true factors with random initialization under standard incoherence assumptions on the factors of the tensor. We demonstrate the significant practical superiority of our approach over traditional ALS on a variety of synthetic and real-world tasks. 
            
             
            
            - Large Deviation Property for Waiting Times for Markov and Mixing Processes
             
                We show a concentration theorem for the waiting time until the opening string in the realization of a random process first appears in an independent realization of the same or a different process.