April 9, 2018

Florent Capelli (INRIA - Université Lille 3)

It is well-known that one may faithfully approximate the
expected value of a random variable by taking the average of many
independant observations of the variables. The average quickly
converges to the expected value with good probability. This fact is
commonly used when computing statistics in data mining.

However, in this setting, the assumption that the observations are
independant is usually wrong and we have no more guarantees on the
convergence of the average toward the expected value. One way of
circumventing this problem would be to only use an independant subset
of the observations. Finding a large independant subset is however a
hard problem and this method tends to discard precious data.

An other method proposed by Wang and Ramon starts by assigning weights
to each observation. If the weights are solution to some linear
program, then we have the guarantee that the weighted average
converges toward the expected value.

In practice however, when the number of factor of dependencies is
large, the size of such a linear program can be huge. In this talk,
we will present a way of generating an equivalent compressed linear
program when the dependencies are succinctly represented.

This is joint work with Nicolas Crosetti, Jan Ramon and Joachim
Niehren.