Sample Greedy¶

The sample greedy algorithm is a simple approach that subsamples the full data set with a user-defined sampling probability and then runs an optimization on that subset. This subsampling can lead to obvious speed improvements because fewer elements as selected, but will generally find a lower quality subset because fewer elements are present. This approach is typically used a baseline for other approaches but can save a lot of time on massive data sets that are known to be highly redundant.

API Reference¶

The stochastic greedy algorithm for optimization.

This optimization approach is the stochastic greedy algorithm proposed by Mirzasoleiman et al. (https://las.inf.ethz.ch/files/mirzasoleiman15lazier.pdf). This approach is conceptually similar to the naive greedy algorithm except that it only evaluates a subset of examples at each iteration. Thus, it is easy to parallelize and amenable to acceleration using a GPU while maintaining nice theoretical guarantees.

Parameters¶

self.function : base.SubmodularSelection: A submodular function that implements the _calculate_gains and _select_next methods. This is the function that will be optimized.
epsilon : float, optional: The sampling probability to use when constructing the subset. A subset of size n * epsilon will be selected from.
random_state : int or RandomState or None, optional: The random seed to use for the random selection process.
self.verbose : bool: Whether to display a progress bar during the optimization process.

Attributes¶

self.function : base.SubmodularSelection: A submodular function that implements the _calculate_gains and _select_next methods. This is the function that will be optimized.
self.verbose : bool: Whether to display a progress bar during the optimization process.
self.gains_ : numpy.ndarray or None: The gain that each example would give the last time that it was evaluated.