Thanks Vasia for the very clear explanation. saluti, Stefano
2016-05-17 14:53 GMT+02:00 Vasiliki Kalavri <vasilikikala...@gmail.com>: > Hi Greg, > > I think there is confusion between what delta means in the "delta iteration > operator" of Flink and the "delta approximate implementation" of an > algorithm, such as in PageRank. > > Assuming that we have a graph with a set of vertices and an iterative > fixpoint algorithm that updates the vertex values in each iteration, the > bulk iteration operator will generate a new set of vertex values at > iteration k, which will be the input of iteration k+1, etc.. > > Using the delta iteration operator, we can keep state inside the iteration > and thus, exploit the asymmetrical convergence of some algorithms, i.e. > only compute on the "active" vertices. An active vertex is defined as one > that has changed its value since the last iteration. > > If the state update function is idempotent and weakly monotonic, e.g. min > (like in connected components, sssp), then the delta iteration result is > equivalent to the bulk iteration. > If the state update function is linear and decomposable, e.g. sum (like in > PageRank), then you can re-write the update function and propagate only > _deltas_, instead of value updates. This way the result will again be > equivalent to the bulk iteration result. > > Now, if your application can tolerate some level of inaccuracy, then you > can define a threshold on what it means for a vertex value to "have changed > its value". The higher the threshold, the less active vertices you'll have > and probably faster convergence, but higher error as well. Thus, accuracy > is not a property of the bulk or the delta iteration themselves, but rather > depends on the algorithm and on the vertex activation criterion. > > I hope this clears things up! > -Vasia. > > On 17 May 2016 at 14:39, Till Rohrmann <trohrm...@apache.org> wrote: > > > Hi Greg, > > > > as far as I know there has not been an exhaustive comparison to what > extent > > the delta iterations can achieve the same accuracy as bulk iterations or > > how much accuracy you'll lose. I think it strongly depends on the > problem. > > For example, graph algorithms such as connected components shouldn't > suffer > > from it. In contrast, the PageRank implementation with the THRESHOLD > value > > should not produce the (most) accurate result. Of course this depends on > > the threshold value. Do you want to make such a comparison? > > > > Cheers, > > Till > > > > On Mon, May 16, 2016 at 3:10 PM, Greg Hogan <c...@greghogan.com> wrote: > > > > > Hi, > > > > > > This question has arisen with the HITS algorithm (Hubs and Authorities) > > but > > > the question is the same as with PageRank, for which Stephan published > an > > > excellent discussion and comparison of bulk and delta iterations [0]. > > > > > > Delta iterations are clearly faster. Has there been a comparison as to > > > whether, when, or how delta iterations are more accurate? > > > > > > Greg > > > > > > [0] > > > > > > > > > http://data-artisans.com/data-analysis-with-flink-a-case-study-and-tutorial/ > > > > > >