[ https://issues.apache.org/jira/browse/FLINK-29167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641771#comment-17641771 ]
Zhu Zhu commented on FLINK-29167: --------------------------------- The record orders are retained if they are from the same source and are sent to the same downstream task. Even without a {{union}}, the time of records are never guaranteed to be sorted after a shuffle from different subtasks of a source. > Time out-of-order optimization for merging multiple data streams into one > data stream > -------------------------------------------------------------------------------------- > > Key: FLINK-29167 > URL: https://issues.apache.org/jira/browse/FLINK-29167 > Project: Flink > Issue Type: Improvement > Components: API / DataStream > Affects Versions: 1.14.2 > Reporter: zhangyang > Priority: Major > Original Estimate: 12h > Remaining Estimate: 12h > > Problem Description: > I have many demand scenarios and need to combine more than 2 data > streams (DataStreams) into one data stream. The business behind the data > stream processing requires the time sequence of events to complete the scene > requirements, so I use the union operator of flink to The confluence is > completed, but the data after the confluence does not guarantee its original > event time sequence. > {code:java} > dataStream0 = dataStream0.union(dataStreamArray); {code} > Design suggestion: > When designing the source code, you can merge into the stream in the > order of the array in the dataStreamArray instead of random order. > > Solution suggestion: > At present, I use windowAll to sort the data after the confluence in > chronological order, and complete the overall scene realization, but the > parallelism of windowAll can only be 1, which affects the performance of the > entire directed acyclic graph. In addition, there are two confluence scene > sorting scenes. I haven't thought of a good remedy, so I can only think that > the union of the union is the sequence, which can save a lot of unnecessary > trouble for the event-time stream merging. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)