Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

via GitHub Sat, 21 Jun 2025 05:46:50 -0700


alamb commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993532737


   > How does it ensure that this extra index can be safely ignored by other 
readers? If another parquet reader implementation decides to do a sequential 
whole file scan, will it read into the extra custom data?
   
   I agree with what @zhuqi-lucas says too
   
   The way I think about this is that the parquet file's footer contains 
pointers (offsets) to the actual data in the file. There is no requirement that 
the footer points to all bytes within the file
   
   There are other interesting things that can be done with this setup too (for 
example, concatenating parquet files together without having to re-encode the 
data (you can just copy the bytes around and rewrite the footer) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

Reply via email to