Playing with positive definite matrices – I: matrix monotony and convexity
In a series of a few blog posts, I will present classical and non-classical results on symmetric positive definite matrices. Beyond being mathematically exciting, they arise naturally a lot in machine...
View ArticlePlaying with positive definite matrices – II: entropy edition
Symmetric positive semi-definite (PSD) matrices come up in a variety of places in machine learning, statistics, and optimization, and more generally in most domains of applied mathematics. When...
View ArticleInformation theory with kernel methods
In last month blog post, I presented the von Neumann entropy. It is defined as a spectral function on positive semi-definite (PSD) matrices, and leads to a Bregman divergence called the von Neumann...
View ArticleRethinking SGD’s noise
It seemed a bit unfair to devote a blog to machine learning (ML) without talking about its current core algorithm: stochastic gradient descent (SGD). Indeed, SGD has become, year after year, the basic...
View ArticleRethinking SGD’s noise – II: Implicit Bias
In the previous post, we showed (or at least tried to!) how the inherent noise of the stochastic gradient descent algorithm (SGD), in the context of modern overparametrised architectures, is...
View ArticleSums-of-squares for dummies: a view from the Fourier domain
In these last two years, I have been studying intensively sum-of-squares relaxations for optimization, learning a lot from many great research papers [1, 2], review papers [3], books [4, 5, 6, 7, 8],...
View ArticleDiscrete, continuous and continuized accelerations
In optimization, acceleration is the art of modifying an algorithm in order to obtain faster convergence. Building accelerations and explaining their performance have been the subject of a countless...
View ArticleNon-convex quadratic optimization problems
Among continuous optimization problems, convex problems (with convex objectives and convex constraints) define a class that can be solved efficiently with a variety of algorithms and with arbitrary...
View ArticleRevisiting the classics: Jensen’s inequality
There are a few mathematical results that any researcher in applied mathematics uses on a daily basis. One of them is Jensen’s inequality, which allows bounding expectations of functions of random...
View ArticleUnraveling spectral properties of kernel matrices – I
Since my early PhD years, I have plotted and studied eigenvalues of kernel matrices. In the simplest setting, take independent and identically distributed (i.i.d.) data, such as in the cube below in 2...
View Article