Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-23 Thread Xiaowei Jiang
Very nice discussion! The deadlock issue due to back pressure mechanism is temporary, which is going to be fixed once Stephan change it to a credit based approach. So we probably should not base our proposal on that temporary limitation. Once we have that issue fixed, the operator can choose to

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-21 Thread wenlong.lwl
Hi, Aljoscha, Thanks for the analysis. I also agree with the separated window handling. I am also grad to contribute too. Is there any issue which is not picked yet? Feel free to count me in. We have removed the restriction that connected stream cannot be one side keyed and the other unkeyed to sup

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-21 Thread Aljoscha Krettek
Alright! I created an umbrella Jira issue: https://issues.apache.org/jira/browse/FLINK-6131 which has three sub issues: - https://issues.apache.org/jira/browse/FLINK-4940: Add support for broadcast state - https://issues.apache.org/jira/browse/FLINK-6135: Allowing adding additional inputs to S

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-17 Thread Ventura Del Monte
I agree with your analysis, I think we now have almost everything to start, and I also would be interested in helping you. Please feel free to count me in. Besides, I have few real use cases which require side input and could help in benchmarking the final implementation. Best, Ventura This me

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-17 Thread Gábor Hermann
Thanks for demonstrating the windowed side-input case. I completely agree that handling windowed side-input separately would just simply complicate the implementation. The triggering mechanism for the upstream window could define when the windowed input is ready. I would gladly contribute to a

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-17 Thread Aljoscha Krettek
Yes, I agree! The implementation stuff we talked about so far is only visible at the operator level. A user function that uses the (future) side API would not be aware of whether buffering or blocking is used. It would simply know that it is invoked and that side input is ready. I'll also quickly

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-16 Thread Gábor Hermann
Regarding the CoFlatMap workaround, - For keyed streams, do you suggest that having a per-key buffer stored as keyed state would have a large memory overhead? That must be true, although a workaround could be partitioning the data and using a non-keyed stream. Of course that seems hacky, as we

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-15 Thread Aljoscha Krettek
Hi, thanks for you input! :-) Regarding 1) I don't see the benefit of integrating windowing into the side-input logic. Windowing can happen upstream and whenever that emits new data then operator will notice because there is new input. Having windowing inside the side-input of an operator as well

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-15 Thread wenlong.lwl
Hi, Aljoscha, I just go through your prototype. I like the design of the SideInputReader which can make it flexible to determine when we can get the side input. I agree that side inputs are API sugar on the top of the three components(n-ary inputs, broadcast state and input buffering), following i

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-13 Thread Aljoscha Krettek
Ha! this is turning out to be quite the discussion. :-) Also, thanks Kenn, for chiming in with the Beam perspective! I'll try and address some stuff. It seems we have some consensus on using N-ary operator to implement side inputs. I see two ways forward there: - Have a "pure" N-ary operator tha

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-10 Thread Kenneth Knowles
Hi all, I thought I would briefly join this thread to mention some side input lessons from Apache Beam. My knowledge of Flink is not deep enough, technically or philosophically, to make any specific recommendations. And I might just be repeating things that the docs and threads cover, but I hope i

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-10 Thread Gábor Hermann
Hi all, Thanks Aljoscha for going forward with the side inputs and for the nice proposal! I'm also in favor of the implementation with N-ary input (3.) for the reasons Ventura explained. I'm strongly against managing side inputs at StreamTask level (2.), as it would create another abstractio

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-09 Thread Aljoscha Krettek
Hi Jamie, actually the approach where the .withSideInput() comes before the user function is only required for implementation proposal #1, which I like the least. For the other two it can be after the user function, which is also what I prefer. Regarding semantics: yes, we simply wait for anything

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-09 Thread Jamie Grier
Hi, I think the proposal looks good. The only thing I wasn't clear on was which API is actually being proposed. The one where .withSideInput() comes before the user function or after. I would definitely prefer it come after since that's the normal pattern in the Flink API. I understood that mak

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-09 Thread Aljoscha Krettek
Hi, these are all valuable suggestions and I think that we should implement them when the time is right. However, I would like to first get a minimal viable version of this feature into Flink and then expand on it. I think the last few tries of tackling this problem fizzled out because we got to de

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-07 Thread wenlong.lwl
Hi Aljoscha, thank you for the proposal, it is great to hear about the progress in side input. Following is my point of view: 1. I think there may be an option to block the processing of the main input instead of buffer the data in state because in production, the through put of the main input is

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-07 Thread Ventura Del Monte
Hi Aljoscha, Thank you for the proposal and for bringing up again this discussion. Regarding the implementation aspect,I would say the first way could be easier/faster to implement but it could add some overhead when dealing with multiple side inputs through the current 2-streams union transform.

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-06 Thread Gyula Fóra
Hi Aljoscha, Thank you for the nice proposal! I think it would make sense to allow user's to affect the readiness of the side input. I think making it ready when the first element arrives is only slightly better then making it always ready from usability perspective. For instance if I am joining

[DISCUSS] FLIP-17 Side Inputs

2017-03-06 Thread Aljoscha Krettek
Hi Folks, I would like to finally agree on a plan for implementing side inputs in Flink. There has already been an attempt to come to consensus [1], which resulted in two design documents. I tried to consolidate those two and also added a section about implementation plans. This is the resulting F