tustvold commented on issue #14078:
URL: https://github.com/apache/datafusion/issues/14078#issuecomment-2585169856

   Spilling the row format makes some sense to me, although I suspect IPC will 
outperform it, presuming a fast enough disk.
   
   I feel I ought to point out though that in order for it to be sound to read 
a file without validation, DF needs to be sure nobody else could have 
written/modified it. This may be possible on Unix OSes using some shenanigans 
with unlinked file descriptors, but I suspect isn't generally possible.
   
   I feel I also ought to point out the mmap use-case is slightly different, as 
it is effectively in memory already, the performance benefit of skipping IO may 
be lessened when there are other overheads, e.g. reading the data from disk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to