chenkovsky commented on issue #14816:
URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2678756916

   > Thanks for the response [@alamb](https://github.com/alamb) ! Couple of 
follow up questions:
   > 
   > > There is a PR we are currently working on related to metadata columns 
(which could provide row ids perhaps)
   > > [#14057](https://github.com/apache/datafusion/pull/14057)
   > 
   > Is there any way to get the row_id data for Parquet? Any suggestion to 
build it? [@alamb](https://github.com/alamb) 
[@chenkovsky](https://github.com/chenkovsky)
   > 
   > > Fetching only relevant documents from parquet: the curent reader is 
efficiently setup to fetch large contiguous blocks of values 
([RowSelection](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowSelection.html)).
 [@XiangpengHao](https://github.com/XiangpengHao) has been thinking about a 
bitset representation for selected rows recently so perhaps you can help 
contribute to making that happen in the parquet reader
   > 
   > Will be happy to collaborate on it. 
[@XiangpengHao](https://github.com/XiangpengHao) any initial plan or POC you 
have done for it?
   
   @Arpit-Bandejiya 
   
   I created an example for getting row_id for parquet based on PR #14057. 
https://github.com/chenkovsky/datafusion/pull/3/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to