Re: Query Regarding Customizing Apache Beam for Sequence-Based Workload Processing

2024-10-02 Thread Kenneth Knowles
Ah. That makes sense, since in batch all you have to do is sort by timestamp when you shuffle (which Dataflow has always done anyhow, to optimize windowing) whereas in streaming you need an OrderedListState-like slack buffer and there's latency of approximately the full allowed lateness. It does s

Re: Query Regarding Customizing Apache Beam for Sequence-Based Workload Processing

2024-10-01 Thread Jan Lukavský
Hi Kenn, unfortunately the support for this annotation is not as good as it could be. AFAIK it is currently supported only on Java Direct, Flink, Spark and DataFlow batch runners. DataFlow streaming does not support this. There was some discussion that the expansion could be implemented by a

Re: Query Regarding Customizing Apache Beam for Sequence-Based Workload Processing

2024-10-01 Thread Kenneth Knowles
Also worth calling out RequiresTimeSortedInput ( https://beam.apache.org/releases/javadoc/2.59.0/index.html?org/apache/beam/sdk/transforms/DoFn.RequiresTimeSortedInput.html ). It only operates at the level of a single stateful ParDo but this ordering will persist until the next shuffle on most run

Re: Query Regarding Customizing Apache Beam for Sequence-Based Workload Processing

2024-09-30 Thread Danny McCormick via dev
I'm not sure if I fully understand the use case. When you require ordering, do you need a set of transforms completed on all data before moving to the next set of transforms? Or do you need transforms to complete on a subset of the data before moving to the next subset of the data for the same tran

Re: Query Regarding Customizing Apache Beam for Sequence-Based Workload Processing

2024-09-28 Thread XQ Hu via dev
Not exactly sure what your use case is. This year, at our Beam Summit, Shunping talked about Beam State and OrderedListStates: https://beamsummit.org/sessions/2024/introducing-ordered-list-states/. This might be helpful for you. On Sat, Sep 28, 2024 at 10:30 AM Settara Pramod wrote: > Hi Apache

Query Regarding Customizing Apache Beam for Sequence-Based Workload Processing

2024-09-28 Thread Settara Pramod
Hi Apache Beam Dev Team, First of all, thank you for developing such an amazing project and making it open-source. I have a use case where I encountered some limitations in using Apache Beam to solve my problem. I am working with workloads that are tied to specific sequences. My goal is to proces