Lazy Greedy

The lazy (or accelerated) greedy algorithm is a fast alternate to the naive greedy algorithm that results in the same items being selected. This algorithm exploits the diminishing returns property of submodular functions in order to avoid re-evaluating examples that are known to provide little gain. If an example has a small gain relative to other examples, it is unlikely to be the next selected example because that gain can only go down as more items are selected. Thus, the example should only be re-evaluated once the gains of other examples have gotten smaller.

The key idea of the lazy greedy algorithm is to maintain a priority queue where the examples are the elements in the queue and the priorities are the gains the last time they were evaluated. The algorithm has two steps. The first step is to calculate the gain that each example would have if selected first (the modular upper bound) and populate the priority queue using these values. The second step is to recalculate the gain of the first example in the priority queue and then add the example back into the queue. If the example remains at the front of the queue it is selected because no other example could have a larger gain once re-evaluated (due to the diminishing returns property).

While the worst case time complexity of this algorithm is the same as the naive greedy algorithm, in practice it can be orders of magnitude faster. Empirically, it appears to accelerate graph-based functions much more than it does feature-based ones. Functions also seem to be more accelerated the more curved they are.

API Reference

The lazy/accelerated greedy algorithm for optimization.

This optimization approach is the lazy/accelerated greedy algorithm. It will return the same subset as the naive greedy algorithm, but it uses a priority queue to store the examples to prevent the repeated evaluation of examples that are unlikely to be the next selected one. The benefit of using this approach are that using a priority queue can significantly improve the speed of optimization, but the downsides are that maintaining a priority queue can be costly and that it’s difficult to parallelize the approach or put it on a GPU.

Parameters

self.function : base.SubmodularSelection
A submodular function that implements the _calculate_gains and _select_next methods. This is the function that will be optimized.
self.verbose : bool
Whether to display a progress bar during the optimization process.

Attributes

self.function : base.SubmodularSelection
A submodular function that implements the _calculate_gains and _select_next methods. This is the function that will be optimized.
self.verbose : bool
Whether to display a progress bar during the optimization process.
self.pq : utils.PriorityQueue
The priority queue used to order examples for evaluation.
self.gains_ : numpy.ndarray or None
The gain that each example would give the last time that it was evaluated.