Re: Design documents for consolidated DataStream API

2015-07-14 Thread Márton Balassi
Ok, thanks for the clarification. Let us try to document it in a way that those thoughts are reflected then. Discretization will not happen upfront we can wait with that. On Tue, Jul 14, 2015 at 12:49 PM, Stephan Ewen wrote: > There is no inconsistency between the Batch and Streaming API. They h

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Stephan Ewen
There is no inconsistency between the Batch and Streaming API. They have different semantics - the batch API is implicitly always windowed. There is a naming difference between the two APIs. There is a strong inconsistency within the Streaming API right now. Grouping and aggregating without windo

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Kostas Tzoumas
I think the though was to explicitly not have the same terminology as the batch API to not confuse people. But this is a minor naming issue IMO. On Tue, Jul 14, 2015 at 12:40 PM, Gyula Fóra wrote: > I see your point, reduceByKey is much clearer. > > The question is whether we want to introduce

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Gyula Fóra
I see your point, reduceByKey is much clearer. The question is whether we want to introduce this inconsistency across the two api-s or stick with what we have. On Tue, Jul 14, 2015 at 10:57 AM Aljoscha Krettek wrote: > I agree, the groupBy, in the batch API is misleading, since a > ds.groupBy().

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Aljoscha Krettek
I agree, the groupBy, in the batch API is misleading, since a ds.groupBy().reduce() does not really build any groups, it is really a ds.keyBy().reduceByKey(). In the streaming API we can still fix this, IMHO. On Tue, 14 Jul 2015 at 10:56 Stephan Ewen wrote: > It is not a bit different than the b

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Stephan Ewen
It is not a bit different than the batch API, because streaming semantics are a bit different ;-) One good thing is that we can make things better that were sub-optimal in the Batch API. On Tue, Jul 14, 2015 at 10:55 AM, Stephan Ewen wrote: > keyBy() does not do any grouping. Grouping in stream

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Stephan Ewen
keyBy() does not do any grouping. Grouping in streams in not defined without windows. On Tue, Jul 14, 2015 at 10:48 AM, Gyula Fóra wrote: > If we only want to have either keyBy or groupBy, why not keep groupBy? That > would be more consistent with the batch api. > On Tue, Jul 14, 2015 at 10:35 A

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Gyula Fóra
If we only want to have either keyBy or groupBy, why not keep groupBy? That would be more consistent with the batch api. On Tue, Jul 14, 2015 at 10:35 AM Stephan Ewen wrote: > Concerning your comments: > > 1) In the new design, there is no grouping without windowing. The > KeyedDataStream subsume

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Stephan Ewen
Concerning your comments: 1) In the new design, there is no grouping without windowing. The KeyedDataStream subsumes the grouping and key-ing for partitioned state. The keyBy() + window() makes a parallel grouped window keyBy() alone allows access to partitioned state. My thought was

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Gyula Fóra
I think Marton has some good points here. 1) Is KeyedDataStream a better name if this is only a renaming? 2) the discretize semantics is unclear indeed. Are we operating on a single or sequence of datasets? If the latter why not call it something else (dstream). How are joins and other binary ope

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Márton Balassi
Generally I agree with the new design. Two concerns: 1) Does KeyedDataStream replace GroupedDataStream or is it the latter a special case of the former? The KeyedDataStream as described in the design document is a bit unclear for me. It lists the following usages: a) It is the first step in bui

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Paris Carbone
+1 No further concerns from my side either > On 13 Jul 2015, at 18:30, Gyula Fóra wrote: > > +1 > On Mon, Jul 13, 2015 at 6:23 PM Stephan Ewen wrote: > >> If naming is the only concern, then we should go ahead, because we can >> change names easily (before the release). >> >> In fact, I don'

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Gyula Fóra
+1 On Mon, Jul 13, 2015 at 6:23 PM Stephan Ewen wrote: > If naming is the only concern, then we should go ahead, because we can > change names easily (before the release). > > In fact, I don't think it leaves a bad impression. Global windows are > non-parallel windows. There are also parallel win

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Stephan Ewen
If naming is the only concern, then we should go ahead, because we can change names easily (before the release). In fact, I don't think it leaves a bad impression. Global windows are non-parallel windows. There are also parallel windows. Pick what you need and what works. On Mon, Jul 13, 2015 at

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Gyula Fóra
I think we agree on everything its more of a naming issue :) I thought it might be misleading that global time windows are "non-parallel" windows. We dont want to give a bad impression. (Also we dont want them to think that every global window is parallel but thats not a problem here) Gyula On Mo

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Stephan Ewen
Okay, what is missing about the windowing in your opinion? The core points of the document are: - The parallel windows are per group only. - The implementation of the parallel windows holds window data in the group buffers. - The global windows are non-parallel. May have parallel pre-aggr

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Gyula Fóra
In general I like it, although the main difference between the current and the new one is the windowing and that is still not very clear. Where do we have the full stream time windows for instance?(which is parallel but not keyed) On Mon, Jul 13, 2015 at 4:28 PM Aljoscha Krettek wrote: > +1 I li

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Aljoscha Krettek
+1 I like it as well. On Mon, 13 Jul 2015 at 16:17 Kostas Tzoumas wrote: > +1 from my side > > On Mon, Jul 13, 2015 at 4:15 PM, Stephan Ewen wrote: > > > Do we have consensus on these designs? > > > > If we have, we should get to implementing this soon, because basically > all > > streaming pat

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Kostas Tzoumas
+1 from my side On Mon, Jul 13, 2015 at 4:15 PM, Stephan Ewen wrote: > Do we have consensus on these designs? > > If we have, we should get to implementing this soon, because basically all > streaming patches will have to be revisited in light of this... > > On Tue, Jul 7, 2015 at 3:41 PM, Gyula

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Stephan Ewen
Do we have consensus on these designs? If we have, we should get to implementing this soon, because basically all streaming patches will have to be revisited in light of this... On Tue, Jul 7, 2015 at 3:41 PM, Gyula Fóra wrote: > You are right thats an important issue. > > And I think we should

Re: Design documents for consolidated DataStream API

2015-07-07 Thread Gyula Fóra
You are right thats an important issue. And I think we should also do some renaming with the "iterations" because they are not really iterations like in the batch case and it might confuse some users. Maybe we can call them loops or cycles and rename the api calls to make it more intuitive what ha

Re: Design documents for consolidated DataStream API

2015-07-07 Thread Aljoscha Krettek
Hi, I just noticed that we don't have anything about how iterations and timestamps/watermarks should interact. Cheers, Aljoscha On Mon, 6 Jul 2015 at 23:56 Stephan Ewen wrote: > Hi all! > > As many of you know, there are a ongoing efforts to consolidate the > streaming API for the next release,

Design documents for consolidated DataStream API

2015-07-06 Thread Stephan Ewen
Hi all! As many of you know, there are a ongoing efforts to consolidate the streaming API for the next release, and then graduate it (from beta status). In the process of this consolidation, we want to achieve the following goals. - Make the code more robust and simplify it in parts - Clearly