# Sample Greedy¶

The sample greedy algorithm is a simple approach that subsamples the full data set with a user-defined sampling probability and then runs an optimization on that subset. This subsampling can lead to obvious speed improvements because fewer elements as selected, but will generally find a lower quality subset because fewer elements are present. This approach is typically used a baseline for other approaches but can save a lot of time on massive data sets that are known to be highly redundant.

## API Reference¶

The stochastic greedy algorithm for optimization.

This optimization approach is the stochastic greedy algorithm proposed by Mirzasoleiman et al. (https://las.inf.ethz.ch/files/mirzasoleiman15lazier.pdf). This approach is conceptually similar to the naive greedy algorithm except that it only evaluates a subset of examples at each iteration. Thus, it is easy to parallelize and amenable to acceleration using a GPU while maintaining nice theoretical guarantees.

### Parameters¶

- self.function : base.SubmodularSelection
- A submodular function that implements the _calculate_gains and _select_next methods. This is the function that will be optimized.
- epsilon : float, optional
- The sampling probability to use when constructing the subset. A subset of size n * epsilon will be selected from.
- random_state : int or RandomState or None, optional
- The random seed to use for the random selection process.
- self.verbose : bool
- Whether to display a progress bar during the optimization process.

### Attributes¶

- self.function : base.SubmodularSelection
- A submodular function that implements the _calculate_gains and _select_next methods. This is the function that will be optimized.
- self.verbose : bool
- Whether to display a progress bar during the optimization process.
- self.gains_ : numpy.ndarray or None
- The gain that each example would give the last time that it was evaluated.