I am simply thinking about the best way to send data to different subtasks of the same operator.
Can we go back to the original question? :D Stephan Ewen <se...@apache.org> ezt írta (időpont: 2015. jún. 3., Sze, 23:45): > I think that it may be a bit pre-mature to invest heavily into the parallel > delta-policy windows just yet. > We have not even answered all questions on the key-local delta windows yet: > > - How does it behave with non-monotonous changes? What does the delta > refer to, the max interval in the window, the interval to the earliest > element. The max difference between two consecutive elements? > > - What about the order of records? Are deltas even interesting when > records come in arbitrary order? What about the predictability of recovery > runs? > > > I would assume that a consistent version of the key-local delta windows > will get us a long way, use-case wise. > > Let's learn more about how users use these policies in the "simple" case. > Because that will impact the protocol for global coordination (for examplea > concerning order and relative to what element are the deltas computed, the > first or the min). Otherwise we invest a lot of effort into something where > we have not yet a clear understanding about how we actually want it to > behave, exactly. > > What do you think? > > > > > On Wed, Jun 3, 2015 at 2:14 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > I am talking of course about global delta windows. On the full stream not > > on a partition. Delta windows per partition happens as you said currently > > as well. > > > > On Wednesday, June 3, 2015, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > > > Yes, this is obvious, but if we simply partition the data on the > > > attribute that we use for the delta policy this can be done purely on > > > one machine. No need for complex communication/synchronization. > > > > > > On Wed, Jun 3, 2015 at 1:32 PM, Gyula Fóra <gyula.f...@gmail.com > > > <javascript:;>> wrote: > > > > Yes, we define a delta function from the first element to the last > > > element > > > > in a window. Now let's discretize the stream using this semantics in > > > > parallel. > > > > > > > > Aljoscha Krettek <aljos...@apache.org <javascript:;>> ezt írta > > > (időpont: 2015. jún. 3., > > > > Sze, 12:20): > > > > > > > >> Ah ok. And by distributed you mean that the element that starts the > > > >> window can be processed on a different machine than the element that > > > >> finishes the window? > > > >> > > > >> On Wed, Jun 3, 2015 at 12:11 PM, Gyula Fóra <gyula.f...@gmail.com > > > <javascript:;>> wrote: > > > >> > This is not connected to the current implementation. So lets not > > talk > > > >> about > > > >> > that. > > > >> > > > > >> > This is about theoretical limits to support distributed delta > > policies > > > >> > which has far reaching implications for the windowing policies one > > can > > > >> > implement in a prallel way. > > > >> > > > > >> > But you are welcome to throw in any constructive ideas :) > > > >> > On Wed, Jun 3, 2015 at 11:49 AM Aljoscha Krettek < > > aljos...@apache.org > > > <javascript:;>> > > > >> > wrote: > > > >> > > > > >> >> Part of the reason for my question is this: > > > >> >> https://issues.apache.org/jira/browse/FLINK-1967. Especially my > > > latest > > > >> >> comment there. If we want this, I think we have to overhaul the > > > >> >> windowing system anyways and then it doesn't make sense to > explore > > > >> >> complicated workarounds for the current system. > > > >> >> > > > >> >> On Wed, Jun 3, 2015 at 11:07 AM, Gyula Fóra < > gyula.f...@gmail.com > > > <javascript:;>> > > > >> wrote: > > > >> >> > There are simple ways of implementing it in a non-distributed > or > > > >> >> > inconsistent fashion. > > > >> >> > On Wed, Jun 3, 2015 at 8:55 AM Aljoscha Krettek < > > > aljos...@apache.org <javascript:;>> > > > >> >> wrote: > > > >> >> > > > > >> >> >> This already sounds awfully complicated. Is there no other way > > to > > > >> >> >> implement the delta windows? > > > >> >> >> > > > >> >> >> On Wed, Jun 3, 2015 at 7:52 AM, Gyula Fóra < > > gyula.f...@gmail.com > > > <javascript:;>> > > > >> >> wrote: > > > >> >> >> > Hi Ufuk, > > > >> >> >> > > > > >> >> >> > In the concrete use case I have in mind I only want to send > > > events > > > >> to > > > >> >> >> > another subtask of the same task vertex. > > > >> >> >> > > > > >> >> >> > Specifically: if we want to do distributed delta based > windows > > > we > > > >> >> need to > > > >> >> >> > send after every trigger the element that has triggered the > > > current > > > >> >> >> window. > > > >> >> >> > So practically I want to broadcast some event regularly to > all > > > >> >> subtasks > > > >> >> >> of > > > >> >> >> > the same operator. > > > >> >> >> > > > > >> >> >> > In this case the operators would wait until they receive > this > > > event > > > >> >> so we > > > >> >> >> > need to make sure that this event sending is not blocked by > > the > > > >> actual > > > >> >> >> > records. > > > >> >> >> > > > > >> >> >> > Gyula > > > >> >> >> > > > > >> >> >> > On Tuesday, June 2, 2015, Ufuk Celebi <u...@apache.org > > > <javascript:;>> wrote: > > > >> >> >> > > > > >> >> >> >> > > > >> >> >> >> On 02 Jun 2015, at 22:45, Gyula Fóra <gyf...@apache.org > > > <javascript:;> > > > >> >> <javascript:;>> > > > >> >> >> >> wrote: > > > >> >> >> >> > I am wondering, what is the suggested way to send some > > events > > > >> >> >> directly to > > > >> >> >> >> > another parallel instance in a flink job? For example > from > > > one > > > >> >> mapper > > > >> >> >> to > > > >> >> >> >> > another mapper (of the same operator). > > > >> >> >> >> > > > > >> >> >> >> > Do we have any internal support for this? The first thing > > > that > > > >> we > > > >> >> >> thought > > > >> >> >> >> > of is iterations but that is clearly an overkill. > > > >> >> >> >> > > > >> >> >> >> There is no support for this at the moment. Any parallel > > > instance? > > > >> >> Or a > > > >> >> >> >> subtask instance of the same task? > > > >> >> >> >> > > > >> >> >> >> Can you provide more input on the use case? It is certainly > > > >> possible > > > >> >> to > > > >> >> >> >> add support for this. > > > >> >> >> >> > > > >> >> >> >> If the events don't need to be inline with the records, we > > can > > > >> easily > > > >> >> >> >> setup the TaskEventDispatcher as a separate actor (or > extend > > > the > > > >> task > > > >> >> >> >> manager) to process both backwards flowing events and in > > > general > > > >> any > > > >> >> >> events > > > >> >> >> >> that don't need to be inline with the records. The task > > > deployment > > > >> >> >> >> descriptors need to be extended with the extra parallel > > > instance > > > >> >> >> >> information. > > > >> >> >> >> > > > >> >> >> >> – Ufuk > > > >> >> >> > > > >> >> > > > >> > > > > > >