Dandandan commented on PR #23026:
URL: https://github.com/apache/datafusion/pull/23026#issuecomment-4847491222

   > In terms of evaluating window functions with more memory efficiency, there 
is this paper which is the new hotness that describes segment trees
   > 
   > * https://www.vldb.org/pvldb/vol8/p1058-leis.pdf
   
   I had a look at segment trees but I think they might not be _that_ great if 
for performance optimization of existing functionality:
   
   * Segment Tree is actually slower algorithmically and in practice than 
_removable cumulative_ (which DataFusion uses/supports, i.e. `retract_batch`)  
on some functions (e.g. SUM), see Table 1 and Figure 10:
   
   <img width="962" height="176" alt="image" 
src="https://github.com/user-attachments/assets/3888d868-28de-41de-8f1e-729cb1c5decc";
 />
   
   <img width="514" height="364" alt="image" 
src="https://github.com/user-attachments/assets/7cf6b5ba-f345-453a-9328-c63b4499adc3";
 />
   
   * The paper also agrees segment trees has overhead and other approaches 
should be taken in other cases
   
   * They are mainly better for _variable frame bounds_ (e.g. `between column_1 
and column_2 preceding` ), which is something DataFusion does not support yet 
(although arguably adding it might require segment trees or something similar 
to add that).
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to