Hi Paris, just gave you the permissions (I hope). Let me know if something does not work.
Cheers, Fabian 2016-11-17 13:48 GMT+01:00 Paris Carbone <par...@kth.se>: > We do not have to schedule this for an early Flink release, just saying. > I would just like to get the changes out and you people can review it and > integrate it anytime at your own pace. > > Who is the admin of the wiki? It would be nice to get write access. > > > On 17 Nov 2016, at 13:45, Paris Carbone <par...@kth.se> wrote: > > > > Sounds like a plan! > > > > Can someone grant me access to write in the wiki please? > > My username is “senorcarbone”. > > > > Paris > > > >> On 16 Nov 2016, at 14:30, Gyula Fóra <gyula.f...@gmail.com> wrote: > >> > >> I am not completely sure whether we should deprecate the old API for > 1.2 or > >> remove it completely. Personally I am in favor of removing it, I don't > >> think it is a huge burden to move to the new one if it makes for a much > >> nicer user experience. > >> > >> I think you can go ahead add the FLIP to the wiki and open the PR so we > can > >> start the review if you have it ready anyways. > >> > >> Gyula > >> > >> Paris Carbone <par...@kth.se> ezt írta (időpont: 2016. nov. 16., Sze, > >> 11:55): > >> > >>> Thanks for reviewing, Gyula. > >>> > >>> One thing that is still up to discussion is whether we should remove > >>> completely the old iterations API or simply mark it as deprecated till > v2.0. > >>> Also, not sure what is the best process now. We have the changes ready. > >>> Should I copy the FLIP to the wiki and trigger the PRs or wait for a > few > >>> more days in case someone has objections? > >>> > >>> @Stephan, what is your take on our interpretation of the approach you > >>> suggested? Should we proceed or is there anything that you do not find > nice? > >>> > >>> Paris > >>> > >>>> On 15 Nov 2016, at 10:01, Gyula Fóra <gyf...@apache.org> wrote: > >>>> > >>>> Hi Paris, > >>>> > >>>> I like the proposed changes to the iteration API, this cleans up > things > >>> in > >>>> the Java API without any strict restriction I think (it was never a > >>> problem > >>>> in the Scala API). > >>>> > >>>> The termination algorithm based on the proposed scoped loops seems to > be > >>>> fairly simple and looks good :) > >>>> > >>>> Cheers, > >>>> Gyula > >>>> > >>>> Paris Carbone <par...@kth.se> ezt írta (időpont: 2016. nov. 14., H, > >>> 8:50): > >>>> > >>>>> That would be great Shi! Let's take that offline. > >>>>> > >>>>> Anyone else interested in the iteration changes? It would be nice to > >>>>> incorporate these to v1.2 if possible so I count on your review asap. > >>>>> > >>>>> cheers, > >>>>> Paris > >>>>> > >>>>> On Nov 14, 2016, at 2:59 AM, xiaogang.sxg < > xiaogang....@alibaba-inc.com > >>>>> <mailto:xiaogang....@alibaba-inc.com>> wrote: > >>>>> > >>>>> Hi Paris > >>>>> > >>>>> Unfortunately, the project is not public yet. > >>>>> But i can provide you a primitive implementation of the update > protocol > >>> in > >>>>> the paper. It’s implemented in Storm. Since the protocol assumes the > >>>>> communication channels between different tasks are dual, i think it’s > >>> not > >>>>> easy to adapt it to Flink. > >>>>> > >>>>> Regards > >>>>> Xiaogang > >>>>> > >>>>> > >>>>> 在 2016年11月12日,上午3:03,Paris Carbone <par...@kth.se<mailto:parisc@ > kth.se > >>>>> > >>>>> 写道: > >>>>> > >>>>> Hi Shi, > >>>>> > >>>>> Naiad/Timely Dataflow and other projects use global coordination > which > >>> is > >>>>> very convenient for asynchronous progress tracking in general but it > has > >>>>> some downsides in a production systems that count on in-flight > >>>>> transactional control mechanisms and rollback recovery guarantees. > This > >>> is > >>>>> why we generally prefer decentralized approaches (despite their our > >>>>> downsides). > >>>>> > >>>>> Regarding synchronous/structured iterations, this is a bit off topic > and > >>>>> they are a bit of a different story as you already know. > >>>>> We maintain a graph streaming (gelly-streams) library on Flink that > you > >>>>> might find interesting [1]. Vasia, another Flink committer is also > >>> working > >>>>> on that among others. > >>>>> You can keep an eye on it since we are planning to use this project > as a > >>>>> showcase for a new way of doing structured and fixpoint iterations on > >>>>> streams in the future. > >>>>> > >>>>> P.S. many thanks for sharing your publication, it was an interesting > >>> read. > >>>>> Do you happen to have your source code public? We could most > certainly > >>> use > >>>>> it in an benchmark soon. > >>>>> > >>>>> [1] https://github.com/vasia/gelly-streaming > >>>>> > >>>>> > >>>>> On 11 Nov 2016, at 19:18, SHI Xiaogang <shixiaoga...@gmail.com< > mailto: > >>>>> shixiaoga...@gmail.com><mailto:shixiaoga...@gmail.com>> wrote: > >>>>> > >>>>> Hi, Fouad > >>>>> > >>>>> Thank you for the explanation. Now the centralized method seems > correct > >>> to > >>>>> me. > >>>>> The passing of StatusUpdate events will lead to synchronous > iterations > >>> and > >>>>> we are using the information in each iterations to terminate the > >>>>> computation. > >>>>> > >>>>> Actually, i prefer the centralized method because in many > applications, > >>> the > >>>>> convergence may depend on some global statistics. > >>>>> For example, a PageRank program may terminate the computation when > 99% > >>>>> vertices are converged. > >>>>> I think those learning programs which cannot reach the fixed-point > >>>>> (oscillating around the fixed-point) can benefit a lot from such > >>> features. > >>>>> The decentralized method makes it hard to support such convergence > >>>>> conditions. > >>>>> > >>>>> > >>>>> Another concern is that Flink cannot produce periodical results in > the > >>>>> iteration over infinite data streams. > >>>>> Take a concrete example. Given an edge stream constructing a graph, > the > >>>>> user may need the PageRank weight of each vertex in the graphs > formed at > >>>>> certain instants. > >>>>> Currently Flink does not provide any input or iteration information > to > >>>>> users, making users hard to implement such real-time iterative > >>>>> applications. > >>>>> Such features are supported in both Naiad and Tornado. I think Flink > >>> should > >>>>> support it as well. > >>>>> > >>>>> What do you think? > >>>>> > >>>>> Regards > >>>>> Xiaogang > >>>>> > >>>>> > >>>>> 2016-11-11 19:27 GMT+08:00 Fouad ALi <fouad.alsay...@gmail.com< > mailto: > >>>>> fouad.alsay...@gmail.com><mailto:fouad.alsay...@gmail.com>>: > >>>>> > >>>>> Hi Shi, > >>>>> > >>>>> It seems that you are referring to the centralized algorithm which > is no > >>>>> longer the proposed version. > >>>>> In the decentralized version (check last doc) there is no master > node or > >>>>> global coordination involved. > >>>>> > >>>>> Let us keep this discussion to the decentralized one if possible. > >>>>> > >>>>> To answer your points on the previous approach, there is a catch in > your > >>>>> trace at t7. Here is what is happening : > >>>>> - Head,as well as RS, will receive a 'BroadcastStatusUpdate' from > >>>>> runtime (see 2.1 in the steps). > >>>>> - RS and Heads will broadcast StatusUpdate event and will not notify > >>> its > >>>>> status. > >>>>> - When StatusUpdate event gets back to the head it will notify its > >>>>> WORKING status. > >>>>> > >>>>> Hope that answers your concern. > >>>>> > >>>>> Best, > >>>>> Fouad > >>>>> > >>>>> On Nov 11, 2016, at 6:21 AM, SHI Xiaogang <shixiaoga...@gmail.com > >>> <mailto: > >>>>> shixiaoga...@gmail.com><mailto:shixiaoga...@gmail.com>> > >>>>> wrote: > >>>>> > >>>>> Hi Paris > >>>>> > >>>>> I have several concerns about the correctness of the termination > >>>>> protocol. > >>>>> I think the termination protocol put an end to the computation even > when > >>>>> the computation has not converged. > >>>>> > >>>>> Suppose there exists a loop context constructed by a OP operator, a > Head > >>>>> operator and a Tail operator (illustrated in Figure 2 in the first > >>>>> draft). > >>>>> The stream only contains one record. OP will pass the record to its > >>>>> downstream operators 10 times. In other words, the loop should > iterate > >>> 10 > >>>>> times. > >>>>> > >>>>> If I understood the protocol correctly, the following event sequence > may > >>>>> happen in the computation: > >>>>> t1: RS emits Record to OP. Since RS has reached the "end-of-stream", > >>> the > >>>>> system enters into Speculative Phase. > >>>>> t2: OP receives Record and emits it to TAIL. > >>>>> t3: HEAD receives the UpdateStatus event, and notifies with an IDLE > >>>>> state. > >>>>> t4. OP receives the UpdateStatus event from HEAD, and notifies with > an > >>>>> WORKING state. > >>>>> t5. TAIL receives Record and emits it to HEAD. > >>>>> t6. TAIL receives the UpdateStatus event from OP, and notifies with > an > >>>>> WORKING state. > >>>>> t7. The system starts a new attempt. HEAD receives the UpdateStatus > >>> event > >>>>> and notifies with an IDLE state. (Record is still in transition.) > >>>>> t8. OP receives the UpdateStatus event from HEAD and notifies with an > >>>>> IDLE > >>>>> state. > >>>>> t9. TAIL receives the UpdateStatus event from OP and notifies with an > >>>>> IDLE > >>>>> state. > >>>>> t10. HEAD receives Record from TAIL and emits it to OP. > >>>>> t11. System puts an end to the computation. > >>>>> > >>>>> Though the computation is expected to iterate 10 times, it ends > earlier. > >>>>> The cause is that the communication channels of MASTER=>HEAD and > >>>>> TAIL=>HEAD > >>>>> are not synchronized. > >>>>> > >>>>> I think the protocol follows the idea of the Chandy-Lamport > algorithm to > >>>>> determine a global state. > >>>>> But the information of whether a node has processed any record to > since > >>>>> the > >>>>> last request is not STABLE. > >>>>> Hence i doubt the correctness of the protocol. > >>>>> > >>>>> To determine the termination correctly, we need some information > that is > >>>>> stable. > >>>>> In timelyflow, Naiad collects the progress made in each iteration and > >>>>> terminates the loop when a little progress is made in an iteration > >>>>> (identified by the timestamp vector). > >>>>> The information is stable because the result of an iteration cannot > be > >>>>> changed by the execution of later iterations. > >>>>> > >>>>> A similar method is also adopted in Tornado. > >>>>> You may see my paper for more details about the termination of loops: > >>>>> http://net.pku.edu.cn/~cuibin/Papers/2016SIGMOD.pdf < > >>>>> http://net.pku.edu.cn/~cuibin/Papers/2016SIGMOD.pdf> > >>>>> > >>>>> Regards > >>>>> Xiaogang > >>>>> > >>>>> 2016-11-11 3:19 GMT+08:00 Paris Carbone <par...@kth.se<mailto: > >>>>> par...@kth.se><mailto:par...@kth.se> <mailto: > >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>>>: > >>>>> > >>>>> Hi again Flink folks, > >>>>> > >>>>> Here is our new proposal that addresses Job Termination - the loop > fault > >>>>> tolerance proposal will follow shortly. > >>>>> As Stephan hinted, we need operators to be aware of their scope > level. > >>>>> > >>>>> Thus, it is time we make loops great again! :) > >>>>> > >>>>> Part of this FLIP basically introduces a new functional, > compositional > >>>>> API > >>>>> for defining asynchronous loops for DataStreams. > >>>>> This is coupled with a decentralized algorithm for job termination > with > >>>>> loops - along the lines of what Stephan described. > >>>>> We are already working on the actual prototypes as you can observe in > >>>>> the > >>>>> links of the doc. > >>>>> > >>>>> Please let us know if you like (or don't like) it and why, in this > mail > >>>>> discussion. > >>>>> > >>>>> https://docs.google.com/document/d/1nzTlae0AFimPCTIV1LB3Z2y- > >>>>> PfTHtq3173EhsAkpBoQ > >>>>> > >>>>> cheers > >>>>> Paris and Fouad > >>>>> > >>>>> On 31 Oct 2016, at 12:53, Paris Carbone <par...@kth.se<mailto: > >>>>> par...@kth.se> <mailto: > >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>><mailto: > >>> parisc@ > >>>>> kth.se<http://kth.se/><http://kth.se<http://kth.se/>> < > http://kth.se/ > >>>>>> > >>>>> wrote: > >>>>> > >>>>> Hey Stephan, > >>>>> > >>>>> Thanks for looking into it! > >>>>> > >>>>> +1 for breaking this up, will do that. > >>>>> > >>>>> I can see your point and maybe it makes sense to introduce part of > >>>>> scoping > >>>>> to incorporate support for nested loops (otherwise it can’t work). > >>>>> Let us think about this a bit. We will share another draft for a more > >>>>> detail description of the approach you are suggesting asap. > >>>>> > >>>>> > >>>>> On 27 Oct 2016, at 10:55, Stephan Ewen <se...@apache.org<mailto: > >>>>> se...@apache.org><mailto:se...@apache.org> <mailto: > >>>>> se...@apache.org<mailto:se...@apache.org><mailto:se...@apache.org > >>>>>>> <mailto:sewen > >>>>> @apache.org<http://apache.org/>>> wrote: > >>>>> > >>>>> How about we break this up into two FLIPs? There are after all two > >>>>> orthogonal problems (termination, fault tolerance) with quite > different > >>>>> discussion states. > >>>>> > >>>>> Concerning fault tolerance, I like the ideas. > >>>>> For the termination proposal, I would like to iterate a bit more. > >>>>> > >>>>> *Termination algorithm:* > >>>>> > >>>>> My main concern here is the introduction of a termination coordinator > >>>>> and > >>>>> any involvement of RPC messages when deciding termination. > >>>>> That would be such a fundamental break with the current runtime > >>>>> architecture, and it would make the currently very elegant and simple > >>>>> model > >>>>> much more complicated and harder to maintain. Given that Flink's > >>>>> runtime is > >>>>> complex enough, I would really like to avoid that. > >>>>> > >>>>> The current runtime paradigm coordinates between operators strictly > via > >>>>> in-band events. RPC calls happen between operators and the master for > >>>>> triggering and acknowledging execution and checkpoints. > >>>>> > >>>>> I was wondering whether we can keep following that paradigm and still > >>>>> get > >>>>> most of what you are proposing here. In some sense, all we need to > do is > >>>>> replace RPC calls with in-band events, and "decentralize" the > >>>>> coordinator > >>>>> such that every operator can make its own termination decision by > >>>>> itself. > >>>>> > >>>>> This is only a rough sketch, you probably need to flesh it out more. > >>>>> > >>>>> - I assume that the OP in the diagram knows that it is in a loop and > >>>>> that > >>>>> it is the one connected to the head and tail > >>>>> > >>>>> - When OP receives and EndOfStream Event from the regular source > (RS), > >>>>> it > >>>>> emits an "AttemptTermination" event downstream to the operators > >>>>> involved in > >>>>> the loop. It attaches an attempt sequence number and memorizes that > >>>>> - Tail and Head forward these events > >>>>> - When OP receives the event back with the same attempt sequence > number, > >>>>> and no records came in the meantime, it shuts down and emits > EndOfStream > >>>>> downstream > >>>>> - When other records came back between emitting the > AttemptTermination > >>>>> event and receiving it back, then it emits a new AttemptTermination > >>>>> event > >>>>> with the next sequence number. > >>>>> - This should terminate as soon as the loop is empty. > >>>>> > >>>>> Might this model even generalize to nested loops, where the > >>>>> "AttemptTermination" event is scoped by the loop's nesting level? > >>>>> > >>>>> Let me know what you think! > >>>>> > >>>>> > >>>>> Best, > >>>>> Stephan > >>>>> > >>>>> > >>>>> On Thu, Oct 27, 2016 at 10:19 AM, Stephan Ewen <se...@apache.org > >>> <mailto: > >>>>> se...@apache.org><mailto:se...@apache.org> > >>>>> <mailto:se...@apache.org><mailto: > >>>>> se...@apache.org<mailto:se...@apache.org><mailto:se...@apache.org> > >>>>> <mailto:se...@apache.org>>> wrote: > >>>>> > >>>>> Hi! > >>>>> > >>>>> I am still scanning it and compiling some comments. Give me a bit ;-) > >>>>> > >>>>> Stephan > >>>>> > >>>>> > >>>>> On Wed, Oct 26, 2016 at 6:13 PM, Paris Carbone <par...@kth.se > <mailto: > >>>>> par...@kth.se><mailto:par...@kth.se> <mailto: > >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>><mailto: > >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se> <mailto: > >>>>> par...@kth.se>>> wrote: > >>>>> > >>>>> Hey all, > >>>>> > >>>>> Now that many of you have already scanned the document (judging from > the > >>>>> views) maybe it is time to give back some feedback! > >>>>> Did you like it? Would you suggest an improvement? > >>>>> > >>>>> I would suggest not to leave this in the void. It has to do with > >>>>> important properties that the system promises to provide. > >>>>> Me and Fouad will do our best to answer your questions and discuss > this > >>>>> further. > >>>>> > >>>>> cheers > >>>>> Paris > >>>>> > >>>>> On 21 Oct 2016, at 08:54, Paris Carbone <par...@kth.se<mailto: > >>>>> par...@kth.se><mailto:par...@kth.se> <mailto: > >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>><mailto: > >>> parisc@ > >>>>> kth.se<http://kth.se/><http://kth.se<http://kth.se/>> < > http://kth.se/ > >>>>>>> <mailto:parisc@k > >>>>> th.se<http://th.se/><http://th.se<http://th.se/>> <http://th.se/>< > >>>>> http://th.se<http://th.se/> <http://th.se/>>>> wrote: > >>>>> > >>>>> Hello everyone, > >>>>> > >>>>> Loops in Apache Flink have a good potential to become a much more > >>>>> powerful thing in future version of Apache Flink. > >>>>> There is generally high demand to make them usable and first of all > >>>>> production-ready for upcoming releases. > >>>>> > >>>>> As a first commitment we would like to propose FLIP-13 for consistent > >>>>> processing with Loops. > >>>>> We are also working on scoped loops for Q1 2017 which we can share if > >>>>> there is enough interest. > >>>>> > >>>>> For now, that is an improvement proposal that solves two pending > major > >>>>> issues: > >>>>> > >>>>> 1) The (not so trivial) problem of correct termination of jobs with > >>>>> iterations > >>>>> 2) The applicability of the checkpointing algorithm to iterative > >>>>> dataflow > >>>>> graphs. > >>>>> > >>>>> We would really appreciate it if you go through the linked draft > >>>>> (motivation and proposed changes) for FLIP-13 and point out comments, > >>>>> preferably publicly in this devlist discussion before we go ahead and > >>>>> update the wiki. > >>>>> > >>>>> https://docs.google.com/document/d/1M6ERj-TzlykMLHzPSwW5L9b0 > >>>>> BhDbtoYucmByBjRBISs/edit?usp=sharing > >>>>> > >>>>> cheers > >>>>> > >>>>> Paris and Fouad > >>>>> > >>>>> > >>> > >>> > > > >