Arpit-Bandejiya commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2707312770
Thanks @XiangpengHao for the response. >If you want DataFusion to produce a bitmask for other systems -- I'm not aware of an easy way to do this. But this sounds like a join use case, have you considered adding a row_id column to the parquet files? so that you can select the row_id as the output and join with other systems. I'm trying to do it in the same fashion of using row-id, the problem comes for sparse results in different results from engines. For example if one engine iterator is sparse while datafusion is returning almost every row it becomes quite inefficient because essentially it will end up loading all the data from datafusion. The problem aggravates a bit more since we are now fetching one more column aka row_id from the file. Few query engines like lucene support advance seek operation though I'm not sure if that is possible with datafusion or parquet file in general. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org