Re: [I] Add StreamingWindowExec to DataFusion physical plan to support aggregations over unbounded data [datafusion]

via GitHub Wed, 10 Jul 2024 18:44:25 -0700


ameyc commented on issue #11366:
URL: https://github.com/apache/datafusion/issues/11366#issuecomment-2221829306


   @mustafasrepo & @ozankabak thanks for the feedback. the target usecases we 
were going for are flink style workloads, with data read from kafka that is 
generally not be ordered and thus needs to be watermarked. we tried the vanilla 
aggregates and ran into PipelineBreaking panics.
   
   An example workload we're trying to compute is of the nature, lmk if this 
can already be expressed with current operators as is  --
   
   ```
       let windowed_df = df
           .clone()
           .streaming_window(
               vec![],
               vec![
                   max(col("imu_measurement").field("gps").field("speed")),
                   min(col("imu_measurement").field("gps").field("altitude")),
                   count(col("imu_measurement")).alias("count"),
               ],
               Duration::from_millis(5_000), // 5 second window
               Some(Duration::from_millis(1_000)), // 1 second slide
           )
           .unwrap();
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Add StreamingWindowExec to DataFusion physical plan to support aggregations over unbounded data [datafusion]

Reply via email to