Arpit-Bandejiya commented on issue #14816:
URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2707312770

   Thanks @XiangpengHao  for the response. 
   
   >If you want DataFusion to produce a bitmask for other systems -- I'm not 
aware of an easy way to do this. But this sounds like a join use case, have you 
considered adding a row_id column to the parquet files? so that you can select 
the row_id as the output and join with other systems.
   
   I'm trying to do it in the same fashion of using row-id, the problem comes 
for sparse results in different results from engines. For example if one engine 
iterator is sparse while datafusion is returning almost every row it becomes 
quite inefficient because essentially it will end up loading all the data from 
datafusion. The problem aggravates a bit more since we are now fetching one 
more column aka row_id from the file. Few query engines like lucene support 
advance seek operation though I'm not sure if that is possible with datafusion 
or parquet file in general.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to