chenkovsky commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2678756916
> Thanks for the response [@alamb](https://github.com/alamb) ! Couple of follow up questions: > > > There is a PR we are currently working on related to metadata columns (which could provide row ids perhaps) > > [#14057](https://github.com/apache/datafusion/pull/14057) > > Is there any way to get the row_id data for Parquet? Any suggestion to build it? [@alamb](https://github.com/alamb) [@chenkovsky](https://github.com/chenkovsky) > > > Fetching only relevant documents from parquet: the curent reader is efficiently setup to fetch large contiguous blocks of values ([RowSelection](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowSelection.html)). [@XiangpengHao](https://github.com/XiangpengHao) has been thinking about a bitset representation for selected rows recently so perhaps you can help contribute to making that happen in the parquet reader > > Will be happy to collaborate on it. [@XiangpengHao](https://github.com/XiangpengHao) any initial plan or POC you have done for it? @Arpit-Bandejiya I created an example for getting row_id for parquet based on PR #14057. https://github.com/chenkovsky/datafusion/pull/3/files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org