Sample Greedy

The sample greedy algorithm is a simple approach that subsamples the full data set with a user-defined sampling probability and then runs an optimization on that subset. This subsampling can lead to obvious speed improvements because fewer elements as selected, but will generally find a lower quality subset because fewer elements are present. This approach is typically used a baseline for other approaches but can save a lot of time on massive data sets that are known to be highly redundant.

API Reference

The stochastic greedy algorithm for optimization.

This optimization approach is the stochastic greedy algorithm proposed by Mirzasoleiman et al. (https://las.inf.ethz.ch/files/mirzasoleiman15lazier.pdf). This approach is conceptually similar to the naive greedy algorithm except that it only evaluates a subset of examples at each iteration. Thus, it is easy to parallelize and amenable to acceleration using a GPU while maintaining nice theoretical guarantees.

Parameters

self.function : base.SubmodularSelection
A submodular function that implements the _calculate_gains and _select_next methods. This is the function that will be optimized.
epsilon : float, optional
The sampling probability to use when constructing the subset. A subset of size n * epsilon will be selected from.
random_state : int or RandomState or None, optional
The random seed to use for the random selection process.
self.verbose : bool
Whether to display a progress bar during the optimization process.

Attributes

self.function : base.SubmodularSelection
A submodular function that implements the _calculate_gains and _select_next methods. This is the function that will be optimized.
self.verbose : bool
Whether to display a progress bar during the optimization process.
self.gains_ : numpy.ndarray or None
The gain that each example would give the last time that it was evaluated.