Very nice discussion! The deadlock issue due to back pressure mechanism is
temporary, which is going to be fixed once Stephan change it to a credit based
approach. So we probably should not base our proposal on that temporary
limitation. Once we have that issue fixed, the operator can choose to
Hi, Aljoscha, Thanks for the analysis. I also agree with the separated
window handling. I am also grad to contribute too. Is there any issue which
is not picked yet? Feel free to count me in. We have removed the
restriction that connected stream cannot be one side keyed and the other
unkeyed to sup
Alright! I created an umbrella Jira issue:
https://issues.apache.org/jira/browse/FLINK-6131 which has three sub issues:
- https://issues.apache.org/jira/browse/FLINK-4940: Add support for broadcast
state
- https://issues.apache.org/jira/browse/FLINK-6135: Allowing adding additional
inputs to S
I agree with your analysis, I think we now have almost everything to start,
and I also would be interested in helping you.
Please feel free to count me in. Besides, I have few real use cases which
require side input and could help in benchmarking the final implementation.
Best,
Ventura
This me
Thanks for demonstrating the windowed side-input case. I completely
agree that handling windowed side-input separately would just simply
complicate the implementation. The triggering mechanism for the upstream
window could define when the windowed input is ready.
I would gladly contribute to a
Yes, I agree! The implementation stuff we talked about so far is only
visible at the operator level. A user function that uses the (future)
side API would not be aware of whether buffering or blocking is used. It
would simply know that it is invoked and that side input is ready.
I'll also quickly
Regarding the CoFlatMap workaround,
- For keyed streams, do you suggest that having a per-key buffer stored
as keyed state would have a large memory overhead? That must be true,
although a workaround could be partitioning the data and using a
non-keyed stream. Of course that seems hacky, as we
Hi,
thanks for you input! :-)
Regarding 1)
I don't see the benefit of integrating windowing into the side-input
logic. Windowing can happen upstream and whenever that emits new data
then operator will notice because there is new input. Having windowing
inside the side-input of an operator as well
Hi, Aljoscha, I just go through your prototype. I like the design of the
SideInputReader which can make it flexible to determine when we can get the
side input.
I agree that side inputs are API sugar on the top of the three components(n-ary
inputs, broadcast state and input buffering), following i
Ha! this is turning out to be quite the discussion. :-) Also, thanks
Kenn, for chiming in with the Beam perspective!
I'll try and address some stuff.
It seems we have some consensus on using N-ary operator to implement
side inputs. I see two ways forward there:
- Have a "pure" N-ary operator tha
Hi all,
I thought I would briefly join this thread to mention some side input
lessons from Apache Beam. My knowledge of Flink is not deep enough,
technically or philosophically, to make any specific recommendations. And I
might just be repeating things that the docs and threads cover, but I hope
i
Hi all,
Thanks Aljoscha for going forward with the side inputs and for the nice
proposal!
I'm also in favor of the implementation with N-ary input (3.) for the
reasons Ventura explained. I'm strongly against managing side inputs at
StreamTask level (2.), as it would create another abstractio
Hi Jamie,
actually the approach where the .withSideInput() comes before the user
function is only required for implementation proposal #1, which I like
the least. For the other two it can be after the user function, which is
also what I prefer.
Regarding semantics: yes, we simply wait for anything
Hi, I think the proposal looks good. The only thing I wasn't clear on was
which API is actually being proposed. The one where .withSideInput() comes
before the user function or after. I would definitely prefer it come after
since that's the normal pattern in the Flink API. I understood that mak
Hi,
these are all valuable suggestions and I think that we should implement
them when the time is right. However, I would like to first get a
minimal viable version of this feature into Flink and then expand on it.
I think the last few tries of tackling this problem fizzled out because
we got to de
Hi Aljoscha, thank you for the proposal, it is great to hear about the
progress in side input.
Following is my point of view:
1. I think there may be an option to block the processing of the main input
instead of buffer the data in state because in production, the through put
of the main input is
Hi Aljoscha,
Thank you for the proposal and for bringing up again this discussion.
Regarding the implementation aspect,I would say the first way could
be easier/faster to implement but it could add some overhead when
dealing with multiple side inputs through the current 2-streams union
transform.
Hi Aljoscha,
Thank you for the nice proposal!
I think it would make sense to allow user's to affect the readiness of the
side input. I think making it ready when the first element arrives is only
slightly better then making it always ready from usability perspective. For
instance if I am joining
Hi Folks,
I would like to finally agree on a plan for implementing side inputs in
Flink. There has already been an attempt to come to consensus [1], which
resulted in two design documents. I tried to consolidate those two and
also added a section about implementation plans. This is the resulting
F
19 matches
Mail list logo