Re: Rework of streaming iteration API

2015-07-07 Thread Paris Carbone
Good points. If we want to structured loops on streaming we will need to inject iteration counters. The question is if we really need structured iterations on plain data streams. Window iterations are must-have on the other hand... Paris > On 07 Jul 2015, at 16:43, Kostas Tzoumas wrote: > > I

Re: Rework of streaming iteration API

2015-07-07 Thread Kostas Tzoumas
I see. Perhaps more important IMO is defining the semantics of stream loops with event time. The reason I asked about nested is that Naiad and other designs used a multidimensional timestamp to capture loops: (outer loop counter, inner loop counter, timestamp). I assume that currently making sense

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
Okay, I am fine with this approach as well I see the advantages. Then we just need to find a suitable name for marking a "FeedbackPoint" :) Stephan Ewen ezt írta (időpont: 2015. júl. 7., K, 16:28): > In Aljoscha's approach, we would need a special mutable stream. We could do > it like this: > >

Re: Rework of streaming iteration API

2015-07-07 Thread Stephan Ewen
In Aljoscha's approach, we would need a special mutable stream. We could do it like this: DataStream source = ... FeedbackPoint pt = source.createFeedbackPoint(); DataStream mapper = pt .map(noOpMapper) DataStream feedback = mapper.filter(...) pt .addFeedbacl(feedback) It is basically like the

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
@Aljoscha: Yes, thats basically my point as well. This is what happens now too but we give this mutable datastream a special name : IterativeDataStream This can be handled in very different ways through the api, the goal would be to make something easy to use. I am fine with what we have now becau

Re: Rework of streaming iteration API

2015-07-07 Thread Aljoscha Krettek
I think it could work if we allowed a DataStream to be unioned after creation. For example: DataStream source = .. DataStream mapper = source.map(noOpMapper) DataStream feedback = mapper.filter(...) source.union(feedback) This would basically mean that a DataStream is mutable and can be extended

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
@Kostas: This new API is I believe equivalent in expressivity with the current one. We can define nested loops now as well. And I also don't see nested loops much worse generally than simple loops. Gyula Fóra ezt írta (időpont: 2015. júl. 7., K, 16:14): > Sorry Stephan I meant it slightly differ

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
Sorry Stephan I meant it slightly differently, I see your point: DataStream source = ... SingleInputOperator mapper = source.map(...) mapper.addInput() So the add input would be a method of the operator not the stream. Aljoscha Krettek ezt írta (időpont: 2015. júl. 7., K, 16:12): > I think thi

Re: Rework of streaming iteration API

2015-07-07 Thread Kostas Tzoumas
+1 for rethinking the iterations in DataStream However, wouldn't this proposal allow the definition of arbitrary loops (e.g., nested loops) that are not well behaved afaik? On Tue, Jul 7, 2015 at 4:12 PM, Stephan Ewen wrote: > I see that the newly proposed API makes some things easier to define

Re: Rework of streaming iteration API

2015-07-07 Thread Aljoscha Krettek
I think this would be good yes. I was just about to open an Issue for changing the Streaming Iteration API. :D Then we should also make the implementation very straightforward and simple, right now, the implementation of the iterations is all over the place. On Tue, 7 Jul 2015 at 15:57 Gyula Fóra

Re: Rework of streaming iteration API

2015-07-07 Thread Stephan Ewen
I see that the newly proposed API makes some things easier to define. There is one source of confusion, though, in my opinion: The new API suggests that the data stream actually refers to the operator that created it. The "DataStream mapper = source.map(noOpMapper)" here refers to the map operato

Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
Hey, Along with the suggested changes to the streaming API structure I think we should also rework the "iteration" api. Currently the iteration api tries to mimic the syntax of the batch API while the runtime behaviour is quite different. What we create instead of iterations is really just cyclic