Re: Send events to parallel operator instances

Stephan Ewen Thu, 04 Jun 2015 03:46:58 -0700

There is no "lateral communication" right now. Typical pattern is to break
it up in two operators that communicate in an all-to-all fashion.


On Thu, Jun 4, 2015 at 11:52 AM, Gyula Fóra <gyula.f...@gmail.com> wrote:

> I am simply thinking about the best way to send data to different subtasks
> of the same operator.
>
> Can we go back to the original question? :D
>
> Stephan Ewen <se...@apache.org> ezt írta (időpont: 2015. jún. 3., Sze,
> 23:45):
>
> > I think that it may be a bit pre-mature to invest heavily into the
> parallel
> > delta-policy windows just yet.
> > We have not even answered all questions on the key-local delta windows
> yet:
> >
> >  - How does it behave with non-monotonous changes? What does the delta
> > refer to, the max interval in the window, the interval to the earliest
> > element. The max difference between two consecutive elements?
> >
> >  - What about the order of records? Are deltas even interesting when
> > records come in arbitrary order? What about the predictability of
> recovery
> > runs?
> >
> >
> > I would assume that a consistent version of the key-local delta windows
> > will get us a long way, use-case wise.
> >
> > Let's learn more about how users use these policies in the "simple" case.
> > Because that will impact the protocol for global coordination (for
> examplea
> > concerning order and relative to what element are the deltas computed,
> the
> > first or the min). Otherwise we invest a lot of effort into something
> where
> > we have not yet a clear understanding about how we actually want it to
> > behave, exactly.
> >
> > What do you think?
> >
> >
> >
> >
> > On Wed, Jun 3, 2015 at 2:14 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
> >
> > > I am talking of course about global delta windows. On the full stream
> not
> > > on a partition. Delta windows per partition happens as you said
> currently
> > > as well.
> > >
> > > On Wednesday, June 3, 2015, Aljoscha Krettek <aljos...@apache.org>
> > wrote:
> > >
> > > > Yes, this is obvious, but if we simply partition the data on the
> > > > attribute that we use for the delta policy this can be done purely on
> > > > one machine. No need for complex communication/synchronization.
> > > >
> > > > On Wed, Jun 3, 2015 at 1:32 PM, Gyula Fóra <gyula.f...@gmail.com
> > > > <javascript:;>> wrote:
> > > > > Yes, we define a delta function from the first element to the last
> > > > element
> > > > > in a window. Now let's discretize the stream using this semantics
> in
> > > > > parallel.
> > > > >
> > > > > Aljoscha Krettek <aljos...@apache.org <javascript:;>> ezt írta
> > > > (időpont: 2015. jún. 3.,
> > > > > Sze, 12:20):
> > > > >
> > > > >> Ah ok. And by distributed you mean that the element that starts
> the
> > > > >> window can be processed on a different machine than the element
> that
> > > > >> finishes the window?
> > > > >>
> > > > >> On Wed, Jun 3, 2015 at 12:11 PM, Gyula Fóra <gyula.f...@gmail.com
> > > > <javascript:;>> wrote:
> > > > >> > This is not connected to the current implementation. So lets not
> > > talk
> > > > >> about
> > > > >> > that.
> > > > >> >
> > > > >> > This is about theoretical limits to support distributed delta
> > > policies
> > > > >> > which has far reaching implications for the windowing policies
> one
> > > can
> > > > >> > implement in a prallel way.
> > > > >> >
> > > > >> > But you are welcome to throw in any constructive ideas :)
> > > > >> > On Wed, Jun 3, 2015 at 11:49 AM Aljoscha Krettek <
> > > aljos...@apache.org
> > > > <javascript:;>>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Part of the reason for my question is this:
> > > > >> >> https://issues.apache.org/jira/browse/FLINK-1967. Especially
> my
> > > > latest
> > > > >> >> comment there. If we want this, I think we have to overhaul the
> > > > >> >> windowing system anyways and then it doesn't make sense to
> > explore
> > > > >> >> complicated workarounds for the current system.
> > > > >> >>
> > > > >> >> On Wed, Jun 3, 2015 at 11:07 AM, Gyula Fóra <
> > gyula.f...@gmail.com
> > > > <javascript:;>>
> > > > >> wrote:
> > > > >> >> > There are simple ways of implementing it in a non-distributed
> > or
> > > > >> >> > inconsistent fashion.
> > > > >> >> > On Wed, Jun 3, 2015 at 8:55 AM Aljoscha Krettek <
> > > > aljos...@apache.org <javascript:;>>
> > > > >> >> wrote:
> > > > >> >> >
> > > > >> >> >> This already sounds awfully complicated. Is there no other
> way
> > > to
> > > > >> >> >> implement the delta windows?
> > > > >> >> >>
> > > > >> >> >> On Wed, Jun 3, 2015 at 7:52 AM, Gyula Fóra <
> > > gyula.f...@gmail.com
> > > > <javascript:;>>
> > > > >> >> wrote:
> > > > >> >> >> > Hi Ufuk,
> > > > >> >> >> >
> > > > >> >> >> > In the concrete use case I have in mind I only want to
> send
> > > > events
> > > > >> to
> > > > >> >> >> > another subtask of the same task vertex.
> > > > >> >> >> >
> > > > >> >> >> > Specifically: if we want to do distributed delta based
> > windows
> > > > we
> > > > >> >> need to
> > > > >> >> >> > send after every trigger the element that has triggered
> the
> > > > current
> > > > >> >> >> window.
> > > > >> >> >> > So practically I want to broadcast some event regularly to
> > all
> > > > >> >> subtasks
> > > > >> >> >> of
> > > > >> >> >> > the same operator.
> > > > >> >> >> >
> > > > >> >> >> > In this case the operators would wait until they receive
> > this
> > > > event
> > > > >> >> so we
> > > > >> >> >> > need to make sure that this event sending is not blocked
> by
> > > the
> > > > >> actual
> > > > >> >> >> > records.
> > > > >> >> >> >
> > > > >> >> >> > Gyula
> > > > >> >> >> >
> > > > >> >> >> > On Tuesday, June 2, 2015, Ufuk Celebi <u...@apache.org
> > > > <javascript:;>> wrote:
> > > > >> >> >> >
> > > > >> >> >> >>
> > > > >> >> >> >> On 02 Jun 2015, at 22:45, Gyula Fóra <gyf...@apache.org
> > > > <javascript:;>
> > > > >> >> <javascript:;>>
> > > > >> >> >> >> wrote:
> > > > >> >> >> >> > I am wondering, what is the suggested way to send some
> > > events
> > > > >> >> >> directly to
> > > > >> >> >> >> > another parallel instance in a flink job? For example
> > from
> > > > one
> > > > >> >> mapper
> > > > >> >> >> to
> > > > >> >> >> >> > another mapper (of the same operator).
> > > > >> >> >> >> >
> > > > >> >> >> >> > Do we have any internal support for this? The first
> thing
> > > > that
> > > > >> we
> > > > >> >> >> thought
> > > > >> >> >> >> > of is iterations but that is clearly an overkill.
> > > > >> >> >> >>
> > > > >> >> >> >> There is no support for this at the moment. Any parallel
> > > > instance?
> > > > >> >> Or a
> > > > >> >> >> >> subtask instance of the same task?
> > > > >> >> >> >>
> > > > >> >> >> >> Can you provide more input on the use case? It is
> certainly
> > > > >> possible
> > > > >> >> to
> > > > >> >> >> >> add support for this.
> > > > >> >> >> >>
> > > > >> >> >> >> If the events don't need to be inline with the records,
> we
> > > can
> > > > >> easily
> > > > >> >> >> >> setup the TaskEventDispatcher as a separate actor (or
> > extend
> > > > the
> > > > >> task
> > > > >> >> >> >> manager) to process both backwards flowing events and in
> > > > general
> > > > >> any
> > > > >> >> >> events
> > > > >> >> >> >> that don't need to be inline with the records. The task
> > > > deployment
> > > > >> >> >> >> descriptors need to be extended with the extra parallel
> > > > instance
> > > > >> >> >> >> information.
> > > > >> >> >> >>
> > > > >> >> >> >> – Ufuk
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
>

Re: Send events to parallel operator instances

Reply via email to