ozankabak commented on issue #11404:
URL: https://github.com/apache/datafusion/issues/11404#issuecomment-2222100150

   The consensus at the time when we proposed #4285 was to add general-purpose 
functionality upstream and keep stream processing focused features downstream. 
To that end, we added things like
   
   - Sort based optimizations
   - Equivalence/order tracking
   - Interval arithmetic
   - Various join functionality
   
   (and many other things I forget now) upstream. Per this consensus, 
checkpointing and watermarking (especially when it throws away data depending 
on processing time) did not get the same treatment as they are quite specific 
to stream processing.
   
   Having added many features to upstream DF for a long time now, and going 
through the experience of implementing specific functionality like 
checkpointing, watermarking and others, I think the consensus reached at the 
time of #4285 proved to be a quite reasonable one.
   
   > A pluggable state backend support would be ideal, this also may be useful 
for operators that spill to disk.
   
   This is an interesting idea. If there is sufficient interest in generalizing 
spill-to-disk code to go through a backend, then we should add this upstream. 
In that case, it would be a win-win: We would be helping general-purpose cases 
and also simplify downstream code for people like you and us.
   
   > Happy to take a stab at putting together a gentle introduction for future 
developers from our learnings.
   
   This would be very nice and will certainly be helpful to others who want to 
build streaming systems on top of DataFusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to