Hi Weijie,

Thanks for raising discussions about the new DataStream API. I have a
few questions about the content of the FLIP.

1. Will we provide any API to support choosing which input to consume
between the two inputs of TwoInputStreamProcessFunction? It would be
helpful in online machine learning cases, where a process function
needs to receive the first machine learning model before it can start
predictions on input data. Similar requirements might also exist in
Flink CEP, where a rule set needs to be consumed by the process
function first before it can start matching the event stream against
CEP patterns.

2. A typo might exist in the current FLIP describing the API to
generate a global stream, as I can see either global() or coalesce()
in different places of the FLIP. These two methods might need to be
unified into one method.

3. The order of parameters in the current ProcessFunction is (record,
context, output), while this FLIP proposes to change the order into
(record, output, context). Is there any reason to make this change?

4. Why does this FLIP propose to use connectAndProcess() instead of
connect() (+ keyBy()) + process()? The latter looks simpler to me.

Looking forward to discussing these questions with you.

Best regards,
Yunfeng Zhou

On Tue, Dec 26, 2023 at 2:44 PM weijie guo <guoweijieres...@gmail.com> wrote:
>
> Hi devs,
>
>
> I'd like to start a discussion about FLIP-409: DataStream V2 Building
> Blocks: DataStream, Partitioning and ProcessFunction [1].
>
>
> As the first sub-FLIP for DataStream API V2, we'd like to discuss and
> try to answer some of the most fundamental questions in stream
> processing:
>
>    1. What kinds of data streams do we have?
>    2. How to partition data over the streams?
>    3. How to define a processing on the data stream?
>
> The answer to these questions involve three core concepts: DataStream,
> Partitioning and ProcessFunction. In this FLIP, we will discuss the
> definitions and related API primitives of these concepts in detail.
>
>
> You can find more details in FLIP-409 [1]. This sub-FLIP is at the
> heart of the entire DataStream API V2, and its relationship with other
> sub-FLIPs can be found in the umbrella FLIP [2].
>
>
> Looking forward to hearing from you, thanks!
>
>
> Best regards,
>
> Weijie
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction
>
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2

Reply via email to