Arpit-Bandejiya commented on issue #14816:
URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2677631277

   Thanks for the response @alamb ! Couple of follow up questions:
   
   >There is a PR we are currently working on related to metadata columns 
(which could provide row ids perhaps)
   https://github.com/apache/datafusion/pull/14057
   
   Is there any way to get the row_id data for Parquet? Any suggestion to build 
it? @alamb  @chenkovsky
   
   >Fetching only relevant documents from parquet: the curent reader is 
efficiently setup to fetch large contiguous blocks of values 
([RowSelection](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowSelection.html)).
 @XiangpengHao has been thinking about a bitset representation for selected 
rows recently so perhaps you can help contribute to making that happen in the 
parquet reader
   
   Will be happy to collaborate on it. @XiangpengHao any initial plan or POC 
you have done for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to