akurmustafa commented on issue #23197:
URL: https://github.com/apache/datafusion/issues/23197#issuecomment-4836070068

   Yes, as @alamb and @2010YOUY01 said the original idea was to have a support 
for a window operator where given the input is ordered by either or both by 
PARTITION BY and ORDER BY clauses; Window function didn't buffer all of the 
batches at its input to save memory and support streaming.
   
   For a window function with following clause `PARTITION BY <expr1> ORDER BY 
<expr2>` operator supports input with following orderings:
   - case1: `<expr1>`
   - case2: `<exp2>`
   - case3: `<exp1> + <exp2>`
   
   I think, most of the complexity in the implementation comes from having the 
support for different use cases. However, I think case 1 and case 2 are mostly 
for streaming cases. I don't see a benefit for keeping them for non-streaming 
cases. If we assume input ordering will always be satisfied as in the case 3 
(both PARTITION BY AND ORDER BY expression clauses), I think implementation can 
be simplified and at the end DataFusion can have single Window operator which 
doesn't expect all input to be buffered. 
   
   As far as I remember, `WindowAggExec` works assumes always case 3 is valid 
and buffers all of the data at its input. 
   
   I agree to @alamb and @2010YOUY01  that focusing on `WindowAggExec` for 
improvement is better course and in the future maybe in the planning phase, we 
can only generate plans that contain `WindowAggExec` then discontinue the 
`BoundedWindowAggExec`.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to