alamb commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993532737
> How does it ensure that this extra index can be safely ignored by other readers? If another parquet reader implementation decides to do a sequential whole file scan, will it read into the extra custom data? I agree with what @zhuqi-lucas says too The way I think about this is that the parquet file's footer contains pointers (offsets) to the actual data in the file. There is no requirement that the footer points to all bytes within the file There are other interesting things that can be done with this setup too (for example, concatenating parquet files together without having to re-encode the data (you can just copy the bytes around and rewrite the footer) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org