Keynote Speakers
Clicking a speaker’s photo will jump to their talk information below.
Dan Alistarh
Institute of Science and Technology Austria / Neural Magic
SueYeon Chung
New York University / Flatiron Institute
Kostas Daniilidis
University of Pennsylvania
Maryam Fazel
University of Washington
Tom Goldstein
University of Maryland
Yingbin Liang
Ohio State University
Dimitris Papailiopoulos
University of Wisconsin-Madison
Stefano Soatto
University of California, Los Angeles
Jong Chul Ye
Korea Advanced Institute of Science and Technology (KAIST)
Talk Details
Dan Alistarh
Institute of Science and Technology Austria / Neural Magic
Title: Accurate Model Compression at GPT Scale
Time and Location: Day 1, 4:00 PM HKT, Rayson Huang Theatre
Abstract
A key barrier to the wide deployment of highly-accurate machine learning models, whether for language or vision, is their high computational and memory overhead. Although we possess the mathematical tools for highly-accurate compression of such models, these theoretically-elegant techniques require second-order information of the model’s loss function, which is hard to even approximate efficiently at the scale of billion-parameter models.In this talk, I will describe our work on bridging this computational divide, which enables the accurate second-order pruning and quantization of models at truly massive scale. Compressed using our techniques, models with billions and even trillions of parameters can be executed efficiently on a few GPUs, with significant speedups, and negligible accuracy loss. Based in part on our work, the community has been able to run accurate billion or even trillion-parameter models on computationally-limited devices.
Bio
Dan Alistarh is a Professor at IST Austria, in Vienna. Previously, he was a Researcher with Microsoft, a Postdoc at MIT CSAIL, and received his PhD from the EPFL. His research is on algorithms for efficient machine learning and high-performance computing, with a focus on scalable DNN inference and training, for which he was awarded an ERC Starting Grant in 2018. In his spare time, he works with the ML research team at Neural Magic, a startup based in Boston, on making compression faster, more accurate and accessible to practitioners.
SueYeon Chung
New York University / Flatiron Institute
Title: Multi-level theory of neural representations: Capacity of neural manifolds in biological and artificial neural networks
Time and Location: Day 4, 10:00 AM HKT, Lee Shau Kee Lecture Ctr.
Abstract
A central goal in neuroscience is to understand how orchestrated computations in the brain arise from the properties of single neurons and networks of such neurons. Answering this question requires theoretical advances that shine a light on the ‘black box’ of representations in neural circuits. In this talk, we will demonstrate theoretical approaches that help describe how cognitive task implementations emerge from the structure in neural populations and from biologically plausible neural networks.
We will introduce a new theory that connects geometric structures that arise from neural population responses (i.e., neural manifolds) to the neural representation’s efficiency in implementing a task. In particular, this theory describes how many neural manifolds can be represented (or ‘packed’) in the neural activity space while they can be linearly decoded by a downstream readout neuron. The intuition from this theory is remarkably simple: like a sphere packing problem in physical space, we can encode many “neural manifolds” into the neural activity space if these manifolds are small and low-dimensional, and vice versa.
Next, we will describe how such an approach can, in fact, open the ‘black box’ of distributed neuronal circuits in a range of settings, such as experimental neural datasets and artificial neural networks. In particular, our method overcomes the limitations of traditional dimensionality reduction techniques, as it operates directly on the high-dimensional representations. Furthermore, this method allows for simultaneous multi-level analysis, by measuring geometric properties in neural population data and estimating the amount of task information embedded in the same population.
Finally, we will discuss our recent efforts to fully extend this multi-level description of neural populations by (1) understanding how task-implementing neural manifolds emerge across brain regions and during learning, (2) investigating how neural tuning properties shape the representation geometry in early sensory areas, and (3) demonstrating the impressive task performance and neural predictivity achieved by optimizing a deep network to maximize the capacity of neural manifolds. By expanding our mathematical toolkit for analyzing representations underlying complex neuronal networks, we hope to contribute to the long-term challenge of understanding the neuronal basis of tasks and behaviors.
Bio
SueYeon Chung is an Assistant Professor in the Center for Neural Science at NYU, with a joint appointment in the Center for Computational Neuroscience at the Flatiron Institute, an internal research division of the Simons Foundation. She is also an affiliated faculty member at the Center for Data Science and Cognition and Perception Program at NYU. Prior to joining NYU, she was a Postdoctoral Fellow in the Center for Theoretical Neuroscience at Columbia University, and BCS Fellow in Computation at MIT. Before that, she received a Ph.D. in applied physics at Harvard University, and a B.A. in mathematics and physics at Cornell University. She received the Klingenstein-Simons Fellowship Award in Neuroscience in 2023. Her main research interests lie at the intersection between computational neuroscience and deep learning, with a particular focus on understanding and interpreting neural computation in biological and artificial neural networks by employing methods from neural network theory, statistical physics, and high-dimensional statistics.
Kostas Daniilidis
University of Pennsylvania
Title: Parsimony through Equivariance
Time and Location: Day 2, 1:30 PM HKT, Lee Shau Kee Lecture Ctr.
Abstract
Equivariant representations are crucial in various scientific and engineering domains because they encode the inherent symmetries present in physical and biological systems, thereby providing a more natural and efficient way to model them. In the context of machine learning and perception, equivariant representations ensure that the output of a model changes in a predictable way in response to transformations of its input, such as 2D or 3D rotation or scaling. In this talk, we will show a systematic way of how to achieve equivariance by design and how such an approach can yield parsimony in training data and model capacity.
Bio
Kostas Daniilidis is the Ruth Yalom Stone Professor of Computer and Information Science at the University of Pennsylvania where he has been faculty since 1998. He is an IEEE Fellow. He was the director of the GRASP laboratory from 2008 to 2013, Associate Dean for Graduate Education from 2012-2016, and Faculty Director of Online Learning from 2013- 2017. He obtained his undergraduate degree in Electrical Engineering from the National Technical University of Athens, 1986, and his PhD (Dr.rer.nat.) in Computer Science from the University of Karlsruhe, 1992, under the supervision of Hans-Hellmut Nagel. He received the Best Conference Paper Award at ICRA 2017. He co-chaired ECCV 2010 and 3DPVT 2006. His most cited works have been on event-based vision, equivariant learning, 3D human pose, and hand-eye calibration.
Maryam Fazel
University of Washington
Title: Flat Minima and Generalization in Learning: The Case of Low-rank Matrix Recovery
Time and Location: Day 1, 1:30 PM HKT, Rayson Huang Theatre
Abstract
Many behaviors observed in deep neural networks still lack satisfactory explanation; e.g., how does an overparameterized neural network avoid overfitting and generalize to unseen data? Empirical evidence suggests that generalization depends on which zero-loss local minimum is attained during training. The shape of the training loss around a local minimum affects the model’s performance: “Flat” minima—around which the loss grows slowly—appear to generalize well. Clarifying this phenomenon helps explain generalization properties, which still largely remain a mystery.
In this talk we focus on a simple class of overparameterized nonlinear models, those arising in low-rank matrix recovery. We study several key models: matrix sensing, phase retrieval, robust Principal Component Analysis, covariance matrix estimation, and single hidden layer neural networks with quadratic activation. We prove that in these models, flat minima (measured by average curvature) exactly recover the ground truth under standard statistical assumptions, and we prove weak recovery for matrix completion. These results suggest (i) a theoretical basis for favoring methods that bias iterates towards flat solutions, (ii) use of Hessian trace as a good regularizer. Since the landscape properties we prove are algorithm-agnostic, a future direction is to pair these findings with the analysis of common training algorithms to better understand the interplay between the loss landscape and algorithmic implicit bias.
Bio
Maryam Fazel is the Moorthy Family Professor of Electrical and Computer Engineering at the University of Washington, with adjunct appointments in Computer Science and Engineering, Mathematics, and Statistics. Maryam received her MS and PhD from Stanford University, her BS from Sharif University of Technology in Iran, and was a postdoctoral scholar at Caltech before joining UW. She is a recipient of the NSF Career Award, UWEE Outstanding Teaching Award, and UAI conference Best Student Paper Award with her student. She directs the Institute for Foundations of Data Science (IFDS), a multi-site NSF TRIPODS Institute. She serves on the Editorial board of the MOS-SIAM Book Series on Optimization, is an Associate Editor of the SIAM Journal on Mathematics of Data Science and an Action Editor of Journal of Machine Learning Research. Her current research interests are in the area of optimization in machine learning and control.
Tom Goldstein
University of Maryland
Title: Statistical methods for addressing safety and security issues of generative models
Time and Location: Day 4, 9:00 AM HKT, Lee Shau Kee Lecture Ctr.
Abstract
This talk will have two parts. In the first part, I’ll talk about mathematical perspectives on how to watermark generative models to prevent parameter theft, ways to watermark generative model outputs to enable detection, and ways to perform post-hoc detection of language models without relying on watermarks. I’ll emphasize the important idea of using statistical hypothesis testing and p-values to provide rigorous control of the false-positive rate of detection. In the second part of the talk, I’ll present methods for constructing neural networks that exhibit “slow” thinking abilities akin to human logical reasoning. Rather than learning simple pattern matching rules, these networks have the ability to synthesize algorithmic reasoning processes and solve difficult discrete search and planning problems that cannot be solved by conventional AI systems. Interestingly, these reasoning systems naturally exhibit error correction and robustness properties that make them more difficult to break than their fast thinking counterparts.
Bio
Tom Goldstein is the Volpi-Cupal Professor of Computer Science at the University of Maryland, and director of the Maryland Center for Machine Learning. His research lies at the intersection of machine learning and optimization, and targets applications in computer vision and signal processing. Professor Goldstein has been the recipient of several awards, including SIAM’s DiPrima Prize, a DARPA Young Faculty Award, a JP Morgan Faculty award, an Amazon Research Award, and a Sloan Fellowship.
Yingbin Liang
Ohio State University
Title: In-Context Convergence of Transformers
Time and Location: Day 2, 9:00 AM HKT, Lee Shau Kee Lecture Ctr.
Abstract
Transformers have recently revolutionized many machine learning domains and one salient discovery is their remarkable in-context learning capability, where models can capture an unseen task by utilizing task-specific prompts without further parameters fine-tuning. In this talk, I will present our recent work that aims at understanding the in-context learning mechanism of transformers. Our focus is on the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent in order to in-context learn linear function classes. I will first present our characterization of the training convergence of in-context learning for data with balanced and imbalanced features, respectively. I will then discuss the insights that we obtain about attention models and training processes. I will also talk about the analysis techniques that we develop which may be useful for a broader set of problems. I will finally conclude my talk with comments on a few future directions.
This is a joint work with Yu Huang (UPenn) and Yuan Cheng (NUS).
Bio
Dr. Yingbin Liang is currently a Professor at the Department of Electrical and Computer Engineering at the Ohio State University (OSU), and a core faculty of the Ohio State Translational Data Analytics Institute (TDAI). She also serves as the Deputy Director of the AI-EDGE Institute at OSU. Dr. Liang received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign in 2005, and served on the faculty of University of Hawaii and Syracuse University before she joined OSU. Dr. Liang’s research interests include machine learning, optimization, information theory, and statistical signal processing. Dr. Liang received the National Science Foundation CAREER Award and the State of Hawaii Governor Innovation Award in 2009. She also received EURASIP Best Paper Award in 2014. She is an IEEE fellow.
Dimitris Papailiopoulos
University of Wisconsin-Madison
Title: Teaching arithmetic to small language models
Time and Location: Day 3, 1:30 PM HKT, Lee Shau Kee Lecture Ctr.
Abstract
Can a language model truly “understand” arithmetic? We explore this by trying to teach small transformers from scratch to perform elementary arithmetic operations, using the next-token prediction objective. We first demonstrate that conventional training data (i.e., “A+B=C”) is not effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions which, in some cases, can be explained through connections to low-rank matrix completion. We then train these small models on chain-of-thought data that includes intermediate steps. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We finally discuss the issue of length generalization: can a model trained on n digits add n+1 digit numbers? Humans don’t need to be taught every digit length of addition to be able to perform it. It turns out that language models aren’t great at length generalization, but we catch glimpses of it in “unstable” scenarios. Surprisingly, the infamous U-shaped overfitting curve makes an appearance!
Bio
Dimitris Papailiopoulos is the Jay & Cynthia Ihlenfeld Associate Professor of Electrical and Computer Engineering at the University of Wisconsin-Madison. His research interests span machine learning, information theory, and distributed systems, with a current focus on understanding the intricacies of large-language models. Before coming to Madison, Dimitris was a postdoctoral researcher at UC Berkeley and a member of the AMPLab. He earned his Ph.D. in ECE from UT Austin, under the supervision of Alex Dimakis. He received his ECE Diploma M.Sc. degree from the Technical University of Crete, in Greece. Dimitris is a recipient of the NSF CAREER Award (2019), three years of Sony Faculty Innovation Awards (2018, 2019 and 2020), a joint IEEE ComSoc/ITSoc Best Paper Award (2020), an IEEE Signal Processing Society, Young Author Best Paper Award (2015), the Vilas Associate Award (2021), the Emil Steiger Distinguished Teaching Award (2021), and the Benjamin Smith Reynolds Award for Excellence in Teaching (2019). In 2018, he co-founded MLSys, a new conference that targets research at the intersection of machine learning and systems.
Stefano Soatto
University of California, Los Angeles
Title: Representation and Control of Meanings in Large Language Models and Multimodal Foundation Models
Time and Location: Day 1, 10:00 AM HKT, Rayson Huang Theatre
Abstract
Large Language Models and Multimodal Foundation Models, despite the simple predictive learning criterion and absence of explicit complexity bias, have shown the ability to capture the structure and “meaning” of data. I will introduce a notion of “meaning” for large language models as equivalence classes of sentences, and describe methods to establish a geometry and topology in the space of meanings, as well as an algebra so meanings can be composed and asymmetric relations such as entailment and implication can be quantified. Meanings as equivalence classes of sentences determined by the trained embedings can be defined, computed and quantified for pre-trained models, without the need for instruction tuning, reinforcement learning, or prompt engineering. Meanings as trajectories can be shown to align with human assessment through manually annotated benchmarks and can, as the outputs of dynamical systems, be controlled. I will show illustrative examples using both text and imaging modalities.
Bio
Professor Soatto received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Assistant and then Associate Professor of Electrical and Biomedical Engineering at Washington University, and Research Associate in Applied Sciences at Harvard University. Between 1995 and 1998 he was also Ricercatore in the Department of Mathematics and Computer Science at the University of Udine - Italy. He received his D.Ing. degree (highest honors) from the University of Padova- Italy in 1992. His general research interests are in Computer Vision and Nonlinear Estimation and Control Theory. In particular, he is interested in ways for computers to use sensory information (e.g. vision, sound, touch) to interact with humans and the environment. Dr. Soatto is the recipient of the David Marr Prize (with Y. Ma, J. Kosecka and S. Sastry of U.C. Berkeley) for work on Euclidean reconstruction and reprojection up to subgroups. He also received the Siemens Prize with the Outstanding Paper Award from the IEEE Computer Society for his work on optimal structure from motion (with R. Brockett of Harvard). He received the National Science Foundation Career Award and the Okawa Foundation Grant. He is Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and a Member of the Editorial Board of the International Journal of Computer Vision (IJCV) and Foundations and Trends in Computer Graphics and Vision.
Jong Chul Ye
Korea Advanced Institute of Science and Technology (KAIST)
Title: Enlarging the Capability of Diffusion Inverse Solvers by Guidance
Time and Location: Day 3, 9:00 AM HKT, Lee Shau Kee Lecture Ctr.
Abstract
The recent advent of diffusion models has led to significant progress in solving inverse problems, leveraging these models as effective generative priors. Nonetheless, challenges related to the ill-posed nature of such problems remain, such as 3D extension and overcoming inherent ambiguities in measurements. In this talk, we introduce strategies to address these issues. First, to enable 3D extension using only 2D diffusion models, we propose a novel approach using two perpendicular pre-trained 2D diffusion models which guides each solver to solve the 3D inverse problem. Specifically, by modeling the 3D data distribution as a product of 2D distributions sliced in different directions, our method effectively addresses the curse of dimensionality from the image guidance from the perpendicular direction. Second, drawing inspiration from the human ability to resolve visual ambiguities through perceptual biases, we introduce a novel latent diffusion inverse solver by incorporating guidance by text prompts. Specifically, our method applies the textual description of the preconception of the solution during the reverse sampling phase, of which description is dynamically reinforced through null-text optimization for adaptive negation. Our comprehensive experimental results show that our method successfully mitigates ambiguity in latent diffusion inverse solvers, enhancing their effectiveness and accuracy.
Bio
Jong Chul Ye is a Professor at the Kim Jaechul Graduate School of Artificial Intelligence (AI) of Korea Advanced Institute of Science and Technology (KAIST), Korea. He received his B.Sc. and M.Sc. degrees from Seoul National University, Korea, and his PhD from Purdue University. Before joining KAIST, he worked at Philips Research and GE Global Research in New York. He has served as an associate editor of IEEE Trans. on Image Processing and an editorial board member for Magnetic Resonance in Medicine. He is currently an associate editor for IEEE Trans. on Medical Imaging and a Senior Editor of IEEE Signal Processing Magazine. He is an IEEE Fellow, was the Chair of IEEE SPS Computational Imaging TC, and IEEE EMBS Distinguished Lecturer. He was a General co-chair (with Mathews Jacob) for IEEE Symposium on Biomedical Imaging (ISBI) 2020. His research interest is in machine learning for biomedical imaging and computer vision.