Hi Greg, I think there is confusion between what delta means in the "delta iteration operator" of Flink and the "delta approximate implementation" of an algorithm, such as in PageRank.
Assuming that we have a graph with a set of vertices and an iterative fixpoint algorithm that updates the vertex values in each iteration, the bulk iteration operator will generate a new set of vertex values at iteration k, which will be the input of iteration k+1, etc.. Using the delta iteration operator, we can keep state inside the iteration and thus, exploit the asymmetrical convergence of some algorithms, i.e. only compute on the "active" vertices. An active vertex is defined as one that has changed its value since the last iteration. If the state update function is idempotent and weakly monotonic, e.g. min (like in connected components, sssp), then the delta iteration result is equivalent to the bulk iteration. If the state update function is linear and decomposable, e.g. sum (like in PageRank), then you can re-write the update function and propagate only _deltas_, instead of value updates. This way the result will again be equivalent to the bulk iteration result. Now, if your application can tolerate some level of inaccuracy, then you can define a threshold on what it means for a vertex value to "have changed its value". The higher the threshold, the less active vertices you'll have and probably faster convergence, but higher error as well. Thus, accuracy is not a property of the bulk or the delta iteration themselves, but rather depends on the algorithm and on the vertex activation criterion. I hope this clears things up! -Vasia. On 17 May 2016 at 14:39, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Greg, > > as far as I know there has not been an exhaustive comparison to what extent > the delta iterations can achieve the same accuracy as bulk iterations or > how much accuracy you'll lose. I think it strongly depends on the problem. > For example, graph algorithms such as connected components shouldn't suffer > from it. In contrast, the PageRank implementation with the THRESHOLD value > should not produce the (most) accurate result. Of course this depends on > the threshold value. Do you want to make such a comparison? > > Cheers, > Till > > On Mon, May 16, 2016 at 3:10 PM, Greg Hogan <c...@greghogan.com> wrote: > > > Hi, > > > > This question has arisen with the HITS algorithm (Hubs and Authorities) > but > > the question is the same as with PageRank, for which Stephan published an > > excellent discussion and comparison of bulk and delta iterations [0]. > > > > Delta iterations are clearly faster. Has there been a comparison as to > > whether, when, or how delta iterations are more accurate? > > > > Greg > > > > [0] > > > > > http://data-artisans.com/data-analysis-with-flink-a-case-study-and-tutorial/ > > >