tustvold commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-2585169856
Spilling the row format makes some sense to me, although I suspect IPC will outperform it, presuming a fast enough disk. I feel I ought to point out though that in order for it to be sound to read a file without validation, DF needs to be sure nobody else could have written/modified it. This may be possible on Unix OSes using some shenanigans with unlinked file descriptors, but I suspect isn't generally possible. I feel I also ought to point out the mmap use-case is slightly different, as it is effectively in memory already, the performance benefit of skipping IO may be lessened when there are other overheads, e.g. reading the data from disk. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org