I agree that there should be multiple alternatives the user(!) can
choose from. Partial out-of-order processing works for many/most
aggregates. However, if you consider Event-Pattern-Matching, global
ordering in necessary (even if the performance penalty might be high).

I would also keep "system-time windows" as an alternative to "source
assigned ts-windows".

It might also be interesting to consider the following paper for
overlapping windows: "Resource sharing in continuous sliding-window
aggregates"

> https://dl.acm.org/citation.cfm?id=1316720

-Matthias

On 06/23/2015 10:37 AM, Gyula Fóra wrote:
> Hey
> 
> I think we should not block PRs unnecessarily if your suggested changes
> might touch them at some point.
> 
> Also I still think we should not put everything in the Datastream because
> it will be a huge mess.
> 
> Also we need to agree on the out of order processing, whether we want it
> the way you proposed it(which is quite costly). Another alternative
> approach there which fits in the current windowing is to filter out if
> order events and apply a special handling operator on them. This would be
> fairly lightweight.
> 
> My point is that we need to consider some alternative solutions. And we
> should not block contributions along the way.
> 
> Cheers
> Gyula
> 
> On Tue, Jun 23, 2015 at 9:55 AM Aljoscha Krettek <aljos...@apache.org>
> wrote:
> 
>> The reason I posted this now is that we need to think about the API and
>> windowing before proceeding with the PRs of Gabor (inverse reduce) and
>> Gyula (removal of "aggregate" functions on DataStream).
>>
>> For the windowing, I think that the current model does not work for
>> out-of-order processing. Therefore, the whole windowing infrastructure will
>> basically have to be redone. Meaning also that any work on the
>> pre-aggregators or optimizations that we do now becomes useless.
>>
>> For the API, I proposed to restructure the interactions between all the
>> different *DataStream classes and grouping/windowing. (See API section of
>> the doc I posted.)
>>
>> On Mon, 22 Jun 2015 at 21:56 Gyula Fóra <gyula.f...@gmail.com> wrote:
>>
>>> Hi Aljoscha,
>>>
>>> Thanks for the nice summary, this is a very good initiative.
>>>
>>> I added some comments to the respective sections (where I didnt fully
>> agree
>>> :).).
>>> At some point I think it would be good to have a public hangout session
>> on
>>> this, which could make a more dynamic discussion.
>>>
>>> Cheers,
>>> Gyula
>>>
>>> Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún.
>> 22.,
>>> H, 21:34):
>>>
>>>> Hi,
>>>> with people proposing changes to the streaming part I also wanted to
>>> throw
>>>> my hat into the ring. :D
>>>>
>>>> During the last few months, while I was getting acquainted with the
>>>> streaming system, I wrote down some thoughts I had about how things
>> could
>>>> be improved. Hopefully, they are in somewhat coherent shape now, so
>>> please
>>>> have a look if you are interested in this:
>>>>
>>>>
>>>
>> https://docs.google.com/document/d/1rSoHyhUhm2IE30o5tkR8GEetjFvMRMNxvsCfoPsW6_4/edit?usp=sharing
>>>>
>>>> This mostly covers:
>>>>  - Timestamps assigned at sources
>>>>  - Out-of-order processing of elements in window operators
>>>>  - API design
>>>>
>>>> Please let me know what you think. Comment in the document or here in
>> the
>>>> mailing list.
>>>>
>>>> I have a PR in the makings that would introduce source timestamps and
>>>> watermarks for keeping track of them. I also hacked a proof-of-concept
>>> of a
>>>> windowing system that is able to process out-of-order elements using a
>>>> FlatMap operator. (It uses panes to perform efficient
>> pre-aggregations.)
>>>>
>>>> Cheers,
>>>> Aljoscha
>>>>
>>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to