Schedule

Date: July 18th 2020, Time Zone: Eastern Daylight Time (GMT -4)

8:00 am

Opening Remarks

8:15 am

Takayuki Osogami: Diversity in reinforcement learning

Invited Talk

Reinforcement learning has seen major success in games and other artificial environments, but its applications in industries and real life are still limited. This limited applicability is partly due to the requirement of the large amount of the training data that needs to be collected through trial and error as well as the difficulty in effectively dealing with multiple or many agents. Diversity and negative dependence are a promising approach to resolve some of the major challenges in today’s reinforcement learning and have gained increasing attention in recent years. In this talk, we will briefly review some of the approaches to introducing diversity in reinforcement learning with a focus on the use of determinantal point processes for effective multi-agent reinforcement learning.

8:50 am

Rémi Bardenet: From random matrices to kernel quadrature: how repulsiveness can speed up Monte Carlo integration

Invited Talk

Eigenvalues of many models of random matrices tend to repell each other. They sometimes repell so much that sample averages over these eigenvalues converge much faster than if the eigenvalues were i.i.d. I will first show how to transform this kind of result into a generic importance sampler with mean square error decreasing as $N^{-1-1/d}$, where $N$ is the number of integrand evaluations, and $d$ is the ambient dimension. This result crucially depends on a repulsive point process for the integration nodes, called an orthogonal polynomial ensemble, itself a particular case of determinantal point process (DPP). With more assumptions on the integrand and more general DPPs, I will then show how to obtain faster Monte Carlo rates. Further generalizing to mixtures of DPPs, I will finally show how to obtain tight integration rates for integrands in a large class of reproducing kernel Hilbert spaces. This last result involves a continuous equivalent to volume sampling, a discrete point process of recent interest in numerical linear algebra. This talk is intended to connect to a few other talks of the day, and is based on the following papers:

https://arxiv.org/abs/1605.00361

https://arxiv.org/abs/1906.07832

https://arxiv.org/abs/2002.09677

9:25 am

Ensemble Kernel Methods, Implicit Regularization and Determinantal Point Processes

Contributed Talk

Joachim Schreurs, Michaël Fanuel, Johan Suykens

By using the framework of Determinantal Point Processes (DPPs), some theoretical results concerning the interplay between diversity and regularization can be obtained. In this paper we show that sampling subsets with kDPPs results in implicit regularization in the context of ridgeless Kernel Regression. Furthermore, we leverage the common setup of state-of-the-art DPP algorithms to sample multiple small subsets and use them in an ensemble of ridgeless regressions. Our first empirical results indicate that ensemble of ridgeless regressors can be interesting to use for datasets including redundant information.

9:40 am

Break

10:10 am

Jennifer Gillenwater: Scaling DPP MAP Inference

Invited Talk

DPP MAP inference, the problem of finding the highest probability set under the distribution defined by a DPP, is of practical importance for a variety of areas such as recommender systems, active learning, and data compression. Unfortunately, finding the exact MAP solution is NP-hard. Often though, the standard greedy submodular maximization algorithm works well in practice for approximating the solution. In this talk, we discuss ways to speed up this simple greedy algorithm, as well as slower, but more accurate alternatives to it. We also discuss how to scale greedy for customized DPPs, where we want to solve the MAP problem multiple times with different weightings of item features. We conclude with a brief note on the complexity of MAP for nonsymmetric DPPs, where we show that greedy scales fairly well if we assume a particular kernel decomposition.

10:45 am

Stefanie Jegelka: Negative Dependence and Sampling

Invited Talk

Probability distributions with strong notions of negative dependence arise in various forms in machine learning. Examples include diversity-inducing probabilistic models, interpretability, exploration and active learning, and randomized algorithms. While, perhaps surprisingly, being more delicate than its positive counterpart, negative dependence enjoys rich mathematical connections and properties that offer a promising toolbox for machine learning.

In this talk, I will summarize some recently important notions of negative dependence, and their implications for sampling algorithms. These results exploit connections to the geometry of polynomials, log concavity, and submodular optimization. We will conclude with an example application of sampling minibatches for optimization.

11:20 am

Mode Finding for SLC Distributions via Regularized Submodular Maximization

Contributed Talk

Ehsan Kazemi, Shervin Minaee, Moran Feldman, Amin Karbasi

In this paper, we propose scalable methods for maximizing a regularized submodular function $f = g- \ell$ expressed as the difference between a monotone submodular function $g$ and a modular function $\ell$. Indeed, submodularity is inherently related to the notions of diversity, coverage, and representativeness. In particular, finding the mode (i.e., the most likely configuration) of many popular probabilistic models of diversity, such as determinantal point processes, submodular probabilistic models, and strongly log-concave distributions, involves maximization of (regularized) submodular functions. Since a regularized function$ f$ can potentially take on negative values, the classic theory of submodular maximization, which heavily relies on the non-negativity assumption of submodular functions, may not be applicable. To circumvent this challenge, we develop the first streaming and distributed algorithm for maximizing regularized submodular functions. Furthermore, we show that how can we find the mode of a strongly log-concave (SLC) distribution by regularized submodular maximization.

11:35 am

Lunch Break

12:30 pm

Poster Session (list of contributed papers)

Room 1: Submodular Maximization

Omid Sadeghi, Reza Eghbali, Maryam Fazel. Online Algorithms for Budget-Constrained DR-Submodular Maximization [paper] [poster slides]

Aytunc Sahin, Joachim Buhmann, Andreas Krause. Constrained Maximization of Lattice Submodular Functions [paper] [poster slides]

Ehsan Kazemi, Shervin Minaee, Moran Feldman, Amin Karbasi. Mode Finding for SLC Distributions via Regularized Submodular Maximization [paper] [poster slides]

Room 2: Determinantal Point Processes

Wei Chen, Faez Ahmed. MO-PaDGAN: Generating Diverse Designs with Multivariate Performance Enhancement [paper] [poster slides]

Honghua Zhang, Steven J Holtzen, Guy Van den Broeck. On the Relationship Between Probabilistic Circuits and Determinantal Point Processes [paper] [poster slides]

Joachim Schreurs, Michaël Fanuel, Johan Suykens. Ensemble Kernel Methods, Implicit Regularization and Determinantal Point Processes [paper] [poster slides]

Room 3: Inference and Bipartite Matching

Zhe Dong, Andriy Mnih, George Tucker. DisARM: An Antithetic Gradient Estimator for Binary Latent Variables [paper] [poster slides]

Saba Ahmadi, Faez Ahmed, John P Dickerson, Mark Fuge, Samir Khuller. On Diverse Bipartite b-Matching [paper] [poster slides]

Pierre-Alexandre Mattei, Jes Frellsen. Negative Dependence Tightens Variational Bounds [paper] [poster slides]

1:15 pm

Yaron Singer: Exponentially Faster Algorithms for Machine Learning

Invited Talk

In this talk I’ll describe a novel approach that yields algorithms whose parallel running time is exponentially faster than any algorithm previously known for a broad range of machine learning applications. The algorithms are designed for submodular function maximization which is the algorithmic engine behind applications such as clustering, network analysis, feature selection, Bayesian inference, ranking, speech and document summarization, recommendation systems, hyperparameter tuning, and many others. Since applications of submodular functions are ubiquitous across machine learning and data sets become larger, there is consistent demand for accelerating submodular optimization. The approach we describe yields simple algorithms whose parallel runtime is logarithmic in the size of the data rather than linear. I’ll introduce the frameworks we recently developed and present experimental results from various application domains.

1:50 pm

Lucy Colwell: Searching for Diverse Biological Sequences

Invited Talk

A central challenge in biotechnology is to be able to predict functional properties of a protein from its sequence, and thus (i) discover new proteins with specific functionality and (ii) better understand the functional effect of genomic mutations. Experimental breakthroughs in our ability to read and write DNA allows data on the relationship between sequence and function to be rapidly acquired. This data can be used to train and validate machine learning models that predict protein function from sequence. However, the cost and latency of wet-lab experiments requires methods that find good sequences in few experimental rounds, where each round contains large batches of sequence designs. In this setting, model-based optimization allows us to take advantage of sample inefficient methods to find diverse optimal sequence candidates to be tested in the wet-lab. These requirements are illustrated by a collaboration that involves the design and experimental validation of AAV capsid protein variants that assemble integral capsids and package their genome, for use in gene therapy applications.

2:25 pm

Constrained Maximization of Lattice Submodular Functions

Contributed Talk

Aytunc Sahin, Joachim Buhmann, Andreas Krause

Submodular optimization over the integer lattice has many applications in machine learning. Although the constrained maximization of submodular functions with coordinate-wise concavity (also called {\em DR-Submodular} functions) is well studied, the maximization of {\em general} lattice submodular functions is considerably more challenging. In this work, we first show that we can optimize lattice submodular functions subject to a discrete (integer) polymatroid constraint using a recently proposed extension, called the Generalized Multilinear Extension. Then, we establish a bound on the rounding error for the discrete polymatroid constraint, which depends on the “distance” between the lattice submodular function to a DR-Submodular function. Lastly, we demonstrate the effectiveness of our algorithm on a Bayesian experimental design problem with repetition and a concave cost.

2:40 pm

Break

3:10 pm

Michael Mahoney: Determinantal Point Processes in Randomized Numerical Linear Algebra

Invited Talk

Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness, most notably random sampling and random projection methods, to develop improved algorithms for ubiquitous matrix problems, such as those that arise in scientific computing, data science, machine learning, etc. A seemingly different topic, but one which has a long history in pure and applied mathematics, is that of Determinantal Point Processes (DPPs), which are stochastic point processes, the probability distribution of which is characterized by sub-determinants of some matrix. Recent work has uncovered deep and fruitful connections between DPPs and RandNLA. For example, random sampling with a DPP leads to new kinds of unbiased estimators for classical RandNLA tasks, enabling more refined statistical and inferential understanding of RandNLA algorithms; a DPP is, in some sense, an optimal randomized method for many RandNLA problems; and a standard RandNLA technique, called leverage score sampling, can be derived as the marginal distribution of a DPP. This work will be reviewed, as will recent algorithmic developments, illustrating that, while not quite as efficient as simply applying a random projection, these DPP-based algorithms are only moderately more expensive.

3:45 pm

On the Relationship Between Probabilistic Circuits and Determinantal Point Processes

Contributed Talk

Honghua Zhang, Steven J Holtzen, Guy Van den Broeck

Scaling probabilistic models to large realistic problems and datasets is a key challenge in machine learning. Central to this effort is the development of tractable probabilistic models (TPMs): models whose structure guarantees efficient probabilistic inference algorithms. The current landscape of TPMs is fragmented: there exist various kinds of TPMs with different strengths and weaknesses. Two of the most prominent classes of TPMs are determinantal point processes (DPPs) and probabilistic circuits (PCs). This paper provides the first systematic study of their relationship. We propose a unified analysis and shared language for discussing DPPs and PCs. Then we establish theoretical barriers for the unification of these two families, and prove that there are cases where DPPs have no compact representation as a class of PCs. We close with a perspective on the central problem of unifying these models.

4:00 pm

Panel Discussion

4:45 pm

Closing Remarks