Re: Thoughts About Streaming

Paris Carbone Thu, 25 Jun 2015 09:36:54 -0700
+1 for writing this down

> On 25 Jun 2015, at 18:11, Aljoscha Krettek <aljos...@apache.org> wrote:
> 
> +1 go ahead
> 
> On Thu, 25 Jun 2015 at 18:02 Stephan Ewen <se...@apache.org> wrote:
> 
>> Hey!
>> 
>> This thread covers many different topics. Lets break this up into separate
>> discussions.
>> 
>> - Operator State is already driven by Gyula and Paris and happening on the
>> above mentioned pull request and the followup discussions.
>> 
>> - For windowing, this discussion has brought some results that we should
>> sum up and clearly write down.
>>   I would like to chime in to do that based on what I learned from the
>> document and this discussion. I also got some input from Marton about what
>> he learned from mapping the Cloud DataFlow constructs to Flink.
>>   I'll draft a Wiki page (with the help of Aljoscha, Marton) that sums
>> this up and documents it for users (if we decide to adopt this).
>>   Then we run this by Gyula, Matthias Sax and Kostas for feedback.
>> 
>> - API style discussions should be yet another thread. This will probably
>> be opened as people start to address that.
>> 
>> 
>> I'll try to get a draft of the wiki version out tomorrow noon and send the
>> link around.
>> 
>> Greetings,
>> Stephan
>> 
>> 
>> 
>> On Thu, Jun 25, 2015 at 3:51 PM, Matthias J. Sax <
>> mj...@informatik.hu-berlin.de> wrote:
>> 
>>> Sure. I picked this up. Using the current model for "occurrence time
>>> semantics" does not work.
>>> 
>>> I elaborated on this in the past many times (but nobody cared). It is
>>> important to make it clear to the user what semantics are supported.
>>> Claiming to support "sliding windows" doesn't mean anything; there are
>>> too many different semantics out there. :)
>>> 
>>> 
>>> On 06/25/2015 03:35 PM, Aljoscha Krettek wrote:
>>>> Yes, I am aware of this requirement and it would also be supported in
>> my
>>>> proposed model.
>>>> 
>>>> The problem is, that the "custom timestamp" feature gives the
>> impression
>>>> that the elements would be windowed according to a user-timestamp. The
>>>> results, however, are wrong because of the assumption about elements
>>>> arriving in order. (This is what I was trying to show with my fancy
>> ASCII
>>>> art and result output.
>>>> 
>>>> On Thu, 25 Jun 2015 at 15:26 Matthias J. Sax <
>>> mj...@informatik.hu-berlin.de>
>>>> wrote:
>>>> 
>>>>> Hi Aljoscha,
>>>>> 
>>>>> I like that you are pushing in this direction. However, IMHO you
>>>>> misinterpreter the current approach. It does not assume that tuples
>>>>> arrive in-order; the current approach has no notion about a
>>>>> pre-defined-order (for example, the order in which the event are
>>>>> created). There is only the notion of "arrival-order" at the operator.
>>>>> From this "arrival-order" perspective, the result are correct(!).
>>>>> 
>>>>> Windowing in the current approach means for example, "sum up an
>>>>> attribute of all events you *received* in the last 5 seconds". That
>> is a
>>>>> different meaning that "sum up an attribute of all event that
>> *occurred*
>>>>> in the last 5 seconds". Both queries are valid and Flink should
>> support
>>>>> both IMHO.
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> 
>>>>> 
>>>>> On 06/25/2015 03:03 PM, Aljoscha Krettek wrote:
>>>>>> Yes, now this also processes about 3 mio Elements (Window Size 5 sec,
>>>>> Slide
>>>>>> 1 sec) but it still fluctuates a lot between 1 mio. and 5 mio.
>>>>>> 
>>>>>> Performance is not my main concern, however. My concern is that the
>>>>> current
>>>>>> model assumes elements to arrive in order, which is simply not true.
>>>>>> 
>>>>>> In your code you have these lines for specifying the window:
>>>>>> .window(Time.of(1l, TimeUnit.SECONDS))
>>>>>> .every(Time.of(1l, TimeUnit.SECONDS))
>>>>>> 
>>>>>> Although this semantically specifies a tumbling window of size 1 sec
>>> I'm
>>>>>> afraid it uses the sliding window logic internally (because of the
>>>>>> .every()).
>>>>>> 
>>>>>> In my tests I only have the first line.
>>>>>> 
>>>>>> 
>>>>>> On Thu, 25 Jun 2015 at 14:32 Gábor Gévay <gga...@gmail.com> wrote:
>>>>>> 
>>>>>>> I'm very sorry, I had a bug in the InversePreReducer. It should be
>>>>>>> fixed now. Can you please run it again?
>>>>>>> 
>>>>>>> I also tried to reproduce some of your performance numbers, but I'm
>>>>>>> getting only less than 1/10th of yours. For example, in the Tumbling
>>>>>>> case, Current/Reduce produces only ~100000 for me. Do you have any
>>>>>>> idea what I could be doing wrong? My code:
>>>>>>> http://pastebin.com/zbEjmGhk
>>>>>>> I am running it on a 2 GHz Core i7.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Gabor
>>>>>>> 
>>>>>>> 
>>>>>>> 2015-06-25 12:31 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>:
>>>>>>>> Hi,
>>>>>>>> I also ran the tests on top of PR 856 (inverse reducer) now. The
>>>>> results
>>>>>>>> seem incorrect. When I insert a Thread.sleep(1) in the tuple
>> source,
>>>>> all
>>>>>>>> the previous tests reported around 3600 tuples (Size 5 sec, Slide 1
>>>>> sec)
>>>>>>>> (Theoretically there would be 5000 tuples in 5 seconds but this is
>>> due
>>>>> to
>>>>>>>> overhead). These are the results for the inverse reduce
>> optimisation:
>>>>>>>> (Tuple 0,38)
>>>>>>>> (Tuple 0,829)
>>>>>>>> (Tuple 0,1625)
>>>>>>>> (Tuple 0,2424)
>>>>>>>> (Tuple 0,3190)
>>>>>>>> (Tuple 0,3198)
>>>>>>>> (Tuple 0,-339368)
>>>>>>>> (Tuple 0,-1315725)
>>>>>>>> (Tuple 0,-2932932)
>>>>>>>> (Tuple 0,-5082735)
>>>>>>>> (Tuple 0,-7743256)
>>>>>>>> (Tuple 0,75701046)
>>>>>>>> (Tuple 0,642829470)
>>>>>>>> (Tuple 0,2242018381)
>>>>>>>> (Tuple 0,5190708618)
>>>>>>>> (Tuple 0,10060360311)
>>>>>>>> (Tuple 0,-94254951)
>>>>>>>> (Tuple 0,-219806321293)
>>>>>>>> (Tuple 0,-1258895232699)
>>>>>>>> (Tuple 0,-4074432596329)
>>>>>>>> 
>>>>>>>> One line is one emitted window count. This is what happens when I
>>>>> remove
>>>>>>>> the Thread.sleep(1):
>>>>>>>> (Tuple 0,660676)
>>>>>>>> (Tuple 0,2553733)
>>>>>>>> (Tuple 0,3542696)
>>>>>>>> (Tuple 0,1)
>>>>>>>> (Tuple 0,1107035)
>>>>>>>> (Tuple 0,2549491)
>>>>>>>> (Tuple 0,4100387)
>>>>>>>> (Tuple 0,-8406583360092)
>>>>>>>> (Tuple 0,-8406582150743)
>>>>>>>> (Tuple 0,-8406580427190)
>>>>>>>> (Tuple 0,-8406580427190)
>>>>>>>> (Tuple 0,-8406580427190)
>>>>>>>> (Tuple 0,6847279255682044995)
>>>>>>>> (Tuple 0,6847279255682044995)
>>>>>>>> (Tuple 0,-5390528042713628318)
>>>>>>>> (Tuple 0,-5390528042711551780)
>>>>>>>> (Tuple 0,-5390528042711551780)
>>>>>>>> 
>>>>>>>> So at some point the pre-reducer seems to go haywire and does not
>>>>> recover
>>>>>>>> from it. The good thing is that it does produce results now, where
>>> the
>>>>>>>> previous Current/Reduce would simply hang and not produce any
>> output.
>>>>>>>> 
>>>>>>>> On Thu, 25 Jun 2015 at 12:02 Gábor Gévay <gga...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> Aljoscha, can you please try the performance test of
>> Current/Reduce
>>>>>>>>> with the InversePreReducer in PR 856? (If you just call sum, it
>> will
>>>>>>>>> use an InversePreReducer.) It would be an interesting test,
>> because
>>>>>>>>> the inverse function optimization really depends on the stream
>> being
>>>>>>>>> ordered, and I think it has the potential of being faster then
>>>>>>>>> Next/Reduce. Especially if the window size is much larger than the
>>>>>>>>> slide size.
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> Gabor
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2015-06-25 11:36 GMT+02:00 Aljoscha Krettek <aljos...@apache.org
>>> :
>>>>>>>>>> I think I'll have to elaborate a bit so I created a
>>> proof-of-concept
>>>>>>>>>> implementation of my Ideas and ran some throughput measurements
>> to
>>>>>>>>>> alleviate concerns about performance.
>>>>>>>>>> 
>>>>>>>>>> First, though, I want to highlight again why the current approach
>>>>> does
>>>>>>>>> not
>>>>>>>>>> work with out-of-order elements (which, again, occur constantly
>> due
>>>>> to
>>>>>>>>> the
>>>>>>>>>> distributed nature of the system). This is the example I posted
>>>>>>> earlier:
>>>>>>>>>> https://gist.github.com/aljoscha/a367012646ab98208d27. The plan
>>>>> looks
>>>>>>>>> like
>>>>>>>>>> this:
>>>>>>>>>> 
>>>>>>>>>> +--+
>>>>>>>>>> | | Source
>>>>>>>>>> +--+
>>>>>>>>>> |
>>>>>>>>>> +-----+
>>>>>>>>>> | |
>>>>>>>>>> | +--+
>>>>>>>>>> | | | Identity Map
>>>>>>>>>> | +--+
>>>>>>>>>> | |
>>>>>>>>>> +-----+
>>>>>>>>>> |
>>>>>>>>>> +--+
>>>>>>>>>> | | Window
>>>>>>>>>> +--+
>>>>>>>>>> |
>>>>>>>>>> |
>>>>>>>>>> +--+
>>>>>>>>>> | | Sink
>>>>>>>>>> +--+
>>>>>>>>>> 
>>>>>>>>>> So all it does is pass the elements through an identity map and
>>> then
>>>>>>>>> merge
>>>>>>>>>> them again before the window operator. The source emits ascending
>>>>>>>>> integers
>>>>>>>>>> and the window operator has a custom timestamp extractor that
>> uses
>>>>> the
>>>>>>>>>> integer itself as the timestamp and should create windows of
>> size 4
>>>>>>> (that
>>>>>>>>>> is elements with timestamp 0-3 are one window, the next are the
>>>>>>> elements
>>>>>>>>>> with timestamp 4-8, and so on). Since the topology basically
>>> doubles
>>>>>>> the
>>>>>>>>>> elements form the source I would expect to get these windows:
>>>>>>>>>> Window: 0, 0, 1, 1, 2, 2, 3, 3
>>>>>>>>>> Window: 4, 4, 6, 6, 7, 7, 8, 8
>>>>>>>>>> 
>>>>>>>>>> The output is this, however:
>>>>>>>>>> Window: 0, 1, 2, 3,
>>>>>>>>>> Window: 4, 0, 1, 2, 3, 4, 5, 6, 7,
>>>>>>>>>> Window: 8, 9, 10, 11,
>>>>>>>>>> Window: 12, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
>>>>>>>>>> Window: 16, 17, 18, 19,
>>>>>>>>>> Window: 20, 21, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
>>>>>>>>>> Window: 24, 25, 26, 27,
>>>>>>>>>> Window: 28, 29, 30, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
>>>>>>>>>> 
>>>>>>>>>> The reason is that the elements simply arrive out-of-order.
>> Imagine
>>>>>>> what
>>>>>>>>>> would happen if the elements actually arrived with some delay
>> from
>>>>>>>>>> different operations.
>>>>>>>>>> 
>>>>>>>>>> Now, on to the performance numbers. The proof-of-concept I
>> created
>>> is
>>>>>>>>>> available here:
>>>>>>>>>> https://github.com/aljoscha/flink/tree/event-time-window-fn-mock
>> .
>>>>> The
>>>>>>>>> basic
>>>>>>>>>> idea is that sources assign the current timestamp when emitting
>>>>>>> elements.
>>>>>>>>>> They also periodically emit watermarks that tell us that no
>>> elements
>>>>>>> with
>>>>>>>>>> an earlier timestamp will be emitted. The watermarks propagate
>>>>> through
>>>>>>>>> the
>>>>>>>>>> operators. The window operator looks at the timestamp of an
>> element
>>>>>>> and
>>>>>>>>>> puts it into the buffer that corresponds to that window. When the
>>>>>>> window
>>>>>>>>>> operator receives a watermark it will look at the in-flight
>> windows
>>>>>>>>>> (basically the buffers) and emit those windows where the
>> window-end
>>>>> is
>>>>>>>>>> before the watermark.
>>>>>>>>>> 
>>>>>>>>>> For measuring throughput I did the following: The source emits
>>> tuples
>>>>>>> of
>>>>>>>>>> the form ("tuple", 1) in an infinite loop. The window operator
>> sums
>>>>> up
>>>>>>>>> the
>>>>>>>>>> tuples, thereby counting how many tuples the window operator can
>>>>>>> handle
>>>>>>>>> in
>>>>>>>>>> a given time window. There are two different implementations for
>>> the
>>>>>>>>>> summation: 1) simply summing up the values in a mapWindow(),
>> there
>>>>> you
>>>>>>>>> get
>>>>>>>>>> a List of all tuples and simple iterate over it. 2) using sum(1),
>>>>>>> which
>>>>>>>>> is
>>>>>>>>>> implemented as a reduce() (that uses the pre-reducer
>>> optimisations).
>>>>>>>>>> 
>>>>>>>>>> These are the performance numbers (Current is the current
>>>>>>> implementation,
>>>>>>>>>> Next is my proof-of-concept):
>>>>>>>>>> 
>>>>>>>>>> Tumbling (1 sec):
>>>>>>>>>> - Current/Map: 1.6 mio
>>>>>>>>>> - Current/Reduce: 2 mio
>>>>>>>>>> - Next/Map: 2.2 mio
>>>>>>>>>> - Next/Reduce: 4 mio
>>>>>>>>>> 
>>>>>>>>>> Sliding (5 sec, slide 1 sec):
>>>>>>>>>> - Current/Map: ca 3 mio (fluctuates a lot)
>>>>>>>>>> - Current/Reduce: No output
>>>>>>>>>> - Next/Map: ca 4 mio (fluctuates)
>>>>>>>>>> - Next/Reduce: 10 mio
>>>>>>>>>> 
>>>>>>>>>> The Next/Reduce variant can basically scale indefinitely with
>>> window
>>>>>>> size
>>>>>>>>>> because the internal state does not rely on the number of
>> elements
>>>>>>> (it is
>>>>>>>>>> just the current sum). The pre-reducer for sliding elements
>> cannot
>>>>>>> handle
>>>>>>>>>> the amount of tuples, it produces no output. For the two Map
>>> variants
>>>>>>> the
>>>>>>>>>> performance fluctuates because they always keep all the elements
>> in
>>>>> an
>>>>>>>>>> internal buffer before emission, this seems to tax the garbage
>>>>>>> collector
>>>>>>>>> a
>>>>>>>>>> bit and leads to random pauses.
>>>>>>>>>> 
>>>>>>>>>> One thing that should be noted is that I had to disable the
>>>>>>> fake-element
>>>>>>>>>> emission thread, otherwise the Current versions would deadlock.
>>>>>>>>>> 
>>>>>>>>>> So, I started working on this because I thought that out-of-order
>>>>>>>>>> processing would be necessary for correctness. And it is
>> certainly,
>>>>>>> But
>>>>>>>>> the
>>>>>>>>>> proof-of-concept also shows that performance can be greatly
>>> improved.
>>>>>>>>>> 
>>>>>>>>>> On Wed, 24 Jun 2015 at 09:46 Gyula Fóra <gyula.f...@gmail.com>
>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I agree lets separate these topics from each other so we can get
>>>>>>> faster
>>>>>>>>>>> resolution.
>>>>>>>>>>> 
>>>>>>>>>>> There is already a state discussion in the thread we started
>> with
>>>>>>> Paris.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 24, 2015 at 9:24 AM Kostas Tzoumas <
>>> ktzou...@apache.org
>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I agree with supporting out-of-order out of the box :-), even
>> if
>>>>>>> this
>>>>>>>>>> means
>>>>>>>>>>>> a major refactoring. This is the right time to refactor the
>>>>>>> streaming
>>>>>>>>>> API
>>>>>>>>>>>> before we pull it out of beta. I think that this is more
>>> important
>>>>>>>>> than
>>>>>>>>>> new
>>>>>>>>>>>> features in the streaming API, which can be prioritized once
>> the
>>>>>>> API
>>>>>>>>> is
>>>>>>>>>> out
>>>>>>>>>>>> of beta (meaning, that IMO this is the right time to stall PRs
>>>>>>> until
>>>>>>>>> we
>>>>>>>>>>>> agree on the design).
>>>>>>>>>>>> 
>>>>>>>>>>>> There are three sections in the document: windowing, state, and
>>>>>>> API.
>>>>>>>>> How
>>>>>>>>>>>> convoluted are those with each other? Can we separate the
>>>>>>> discussion
>>>>>>>>> or
>>>>>>>>>> do
>>>>>>>>>>>> we need to discuss those all together? I think part of the
>>>>>>> difficulty
>>>>>>>>> is
>>>>>>>>>>>> that we are discussing three design choices at once.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Jun 23, 2015 at 7:43 PM, Ted Dunning <
>>>>>>> ted.dunn...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Out of order is ubiquitous in the real-world.  Typically, what
>>>>>>>>>> happens is
>>>>>>>>>>>>> that businesses will declare a maximum allowable delay for
>>>>>>> delayed
>>>>>>>>>>>>> transactions and will commit to results when that delay is
>>>>>>> reached.
>>>>>>>>>>>>> Transactions that arrive later than this cutoff are collected
>>>>>>>>>> specially
>>>>>>>>>>>> as
>>>>>>>>>>>>> corrections which are reported/used when possible.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Clearly, ordering can also be violated during processing, but
>> if
>>>>>>> the
>>>>>>>>>> data
>>>>>>>>>>>>> is originally out of order the situation can't be repaired by
>>> any
>>>>>>>>>>>> protocol
>>>>>>>>>>>>> fixes that prevent transactions from becoming disordered but
>> has
>>>>>>> to
>>>>>>>>>>>> handled
>>>>>>>>>>>>> at the data level.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Jun 23, 2015 at 9:29 AM, Aljoscha Krettek <
>>>>>>>>> aljos...@apache.org
>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I also don't like big changes but sometimes they are
>> necessary.
>>>>>>>>> The
>>>>>>>>>>>>> reason
>>>>>>>>>>>>>> why I'm so adamant about out-of-order processing is that
>>>>>>>>>> out-of-order
>>>>>>>>>>>>>> elements are not some exception that occurs once in a while;
>>>>>>> they
>>>>>>>>>> occur
>>>>>>>>>>>>>> constantly in a distributed system. For example, in this:
>>>>>>>>>>>>>> https://gist.github.com/aljoscha/a367012646ab98208d27 the
>>>>>>>>> resulting
>>>>>>>>>>>>>> windows
>>>>>>>>>>>>>> are completely bogus because the current windowing system
>>>>>>> assumes
>>>>>>>>>>>>> elements
>>>>>>>>>>>>>> to globally arrive in order, which is simply not true. (The
>>>>>>>>> example
>>>>>>>>>>>> has a
>>>>>>>>>>>>>> source that generates increasing integers. Then these pass
>>>>>>>>> through a
>>>>>>>>>>>> map
>>>>>>>>>>>>>> and are unioned with the original DataStream before a window
>>>>>>>>>> operator.)
>>>>>>>>>>>>>> This simulates elements arriving from different operators at
>> a
>>>>>>>>>>>> windowing
>>>>>>>>>>>>>> operator. The example is also DOP=1, I imagine this to get
>>>>>>> worse
>>>>>>>>>> with
>>>>>>>>>>>>>> higher DOP.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What do you mean by costly? As I said, I have a
>>>>>>> proof-of-concept
>>>>>>>>>>>>> windowing
>>>>>>>>>>>>>> operator that can handle out-or-order elements. This is an
>>>>>>> example
>>>>>>>>>>>> using
>>>>>>>>>>>>>> the current Flink API:
>>>>>>>>>>>>>> https://gist.github.com/aljoscha/f8dce0691732e344bbe8.
>>>>>>>>>>>>>> (It is an infinite source of tuples and a 5 second window
>>>>>>> operator
>>>>>>>>>> that
>>>>>>>>>>>>>> counts the tuples.) The first problem is that this code
>>>>>>> deadlocks
>>>>>>>>>>>> because
>>>>>>>>>>>>>> of the thread that emits fake elements. If I disable the fake
>>>>>>>>>> element
>>>>>>>>>>>>> code
>>>>>>>>>>>>>> it works, but the throughput using my mockup is 4 times
>> higher
>>>>>>> .
>>>>>>>>> The
>>>>>>>>>>>> gap
>>>>>>>>>>>>>> widens dramatically if the window size increases.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So, it actually increases performance (unless I'm making a
>>>>>>> mistake
>>>>>>>>>> in
>>>>>>>>>>>> my
>>>>>>>>>>>>>> explorations) and can handle elements that arrive
>> out-of-order
>>>>>>>>>> (which
>>>>>>>>>>>>>> happens basically always in a real-world windowing
>> use-cases).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, 23 Jun 2015 at 12:51 Stephan Ewen <se...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What I like a lot about Aljoscha's proposed design is that
>> we
>>>>>>>>>> need no
>>>>>>>>>>>>>>> different code for "system time" vs. "event time". It only
>>>>>>>>>> differs in
>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>> the timestamps are assigned.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The OOP approach also gives you the semantics of total
>>>>>>> ordering
>>>>>>>>>>>> without
>>>>>>>>>>>>>>> imposing merges on the streams.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Jun 23, 2015 at 10:43 AM, Matthias J. Sax <
>>>>>>>>>>>>>>> mj...@informatik.hu-berlin.de> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I agree that there should be multiple alternatives the
>>>>>>> user(!)
>>>>>>>>>> can
>>>>>>>>>>>>>>>> choose from. Partial out-of-order processing works for
>>>>>>>>> many/most
>>>>>>>>>>>>>>>> aggregates. However, if you consider
>>>>>>> Event-Pattern-Matching,
>>>>>>>>>> global
>>>>>>>>>>>>>>>> ordering in necessary (even if the performance penalty
>>>>>>> might
>>>>>>>>> be
>>>>>>>>>>>>> high).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I would also keep "system-time windows" as an alternative
>>>>>>> to
>>>>>>>>>>>> "source
>>>>>>>>>>>>>>>> assigned ts-windows".
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> It might also be interesting to consider the following
>>>>>>> paper
>>>>>>>>> for
>>>>>>>>>>>>>>>> overlapping windows: "Resource sharing in continuous
>>>>>>>>>> sliding-window
>>>>>>>>>>>>>>>> aggregates"
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> https://dl.acm.org/citation.cfm?id=1316720
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 06/23/2015 10:37 AM, Gyula Fóra wrote:
>>>>>>>>>>>>>>>>> Hey
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think we should not block PRs unnecessarily if your
>>>>>>>>>> suggested
>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>> might touch them at some point.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Also I still think we should not put everything in the
>>>>>>>>>> Datastream
>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>> it will be a huge mess.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Also we need to agree on the out of order processing,
>>>>>>>>> whether
>>>>>>>>>> we
>>>>>>>>>>>>> want
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> the way you proposed it(which is quite costly). Another
>>>>>>>>>>>> alternative
>>>>>>>>>>>>>>>>> approach there which fits in the current windowing is to
>>>>>>>>>> filter
>>>>>>>>>>>> out
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> order events and apply a special handling operator on
>>>>>>> them.
>>>>>>>>>> This
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> fairly lightweight.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> My point is that we need to consider some alternative
>>>>>>>>>> solutions.
>>>>>>>>>>>>> And
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> should not block contributions along the way.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>> Gyula
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, Jun 23, 2015 at 9:55 AM Aljoscha Krettek <
>>>>>>>>>>>>>> aljos...@apache.org>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The reason I posted this now is that we need to think
>>>>>>> about
>>>>>>>>>> the
>>>>>>>>>>>>> API
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> windowing before proceeding with the PRs of Gabor
>>>>>>> (inverse
>>>>>>>>>>>> reduce)
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> Gyula (removal of "aggregate" functions on DataStream).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> For the windowing, I think that the current model does
>>>>>>> not
>>>>>>>>>> work
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> out-of-order processing. Therefore, the whole windowing
>>>>>>>>>>>>>> infrastructure
>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>> basically have to be redone. Meaning also that any work
>>>>>>> on
>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> pre-aggregators or optimizations that we do now becomes
>>>>>>>>>> useless.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> For the API, I proposed to restructure the interactions
>>>>>>>>>> between
>>>>>>>>>>>>> all
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> different *DataStream classes and grouping/windowing.
>>>>>>> (See
>>>>>>>>>> API
>>>>>>>>>>>>>> section
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> the doc I posted.)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Mon, 22 Jun 2015 at 21:56 Gyula Fóra <
>>>>>>>>> gyula.f...@gmail.com
>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Aljoscha,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks for the nice summary, this is a very good
>>>>>>>>> initiative.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I added some comments to the respective sections
>>>>>>> (where I
>>>>>>>>>> didnt
>>>>>>>>>>>>>> fully
>>>>>>>>>>>>>>>>>> agree
>>>>>>>>>>>>>>>>>>> :).).
>>>>>>>>>>>>>>>>>>> At some point I think it would be good to have a public
>>>>>>>>>> hangout
>>>>>>>>>>>>>>> session
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> this, which could make a more dynamic discussion.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> Gyula
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <aljos...@apache.org> ezt írta
>>>>>>> (időpont:
>>>>>>>>>>>> 2015.
>>>>>>>>>>>>>> jún.
>>>>>>>>>>>>>>>>>> 22.,
>>>>>>>>>>>>>>>>>>> H, 21:34):
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>> with people proposing changes to the streaming part I
>>>>>>>>> also
>>>>>>>>>>>>> wanted
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> throw
>>>>>>>>>>>>>>>>>>>> my hat into the ring. :D
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> During the last few months, while I was getting
>>>>>>>>> acquainted
>>>>>>>>>>>> with
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> streaming system, I wrote down some thoughts I had
>>>>>>> about
>>>>>>>>>> how
>>>>>>>>>>>>>> things
>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>> be improved. Hopefully, they are in somewhat coherent
>>>>>>>>> shape
>>>>>>>>>>>> now,
>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>> please
>>>>>>>>>>>>>>>>>>>> have a look if you are interested in this:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> https://docs.google.com/document/d/1rSoHyhUhm2IE30o5tkR8GEetjFvMRMNxvsCfoPsW6_4/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> This mostly covers:
>>>>>>>>>>>>>>>>>>>> - Timestamps assigned at sources
>>>>>>>>>>>>>>>>>>>> - Out-of-order processing of elements in window
>>>>>>>>> operators
>>>>>>>>>>>>>>>>>>>> - API design
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Please let me know what you think. Comment in the
>>>>>>>>> document
>>>>>>>>>> or
>>>>>>>>>>>>> here
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> mailing list.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I have a PR in the makings that would introduce source
>>>>>>>>>>>>> timestamps
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> watermarks for keeping track of them. I also hacked a
>>>>>>>>>>>>>>> proof-of-concept
>>>>>>>>>>>>>>>>>>> of a
>>>>>>>>>>>>>>>>>>>> windowing system that is able to process out-of-order
>>>>>>>>>> elements
>>>>>>>>>>>>>>> using a
>>>>>>>>>>>>>>>>>>>> FlatMap operator. (It uses panes to perform efficient
>>>>>>>>>>>>>>>>>> pre-aggregations.)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>> Aljoscha
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>
Re: Thoughts About Streaming

Reply via email to