alamb commented on issue #10922:
URL: https://github.com/apache/datafusion/issues/10922#issuecomment-2210679342

   > But I'm wondering if flatten is the right thing to do here?
   > 
   > The min or max values for each page will be None if all the values on the 
page happen to be null: 
https://github.com/apache/arrow-rs/blob/master/parquet/src/file/page_index/index.rs#L37-L44
   > 
   > Using flatten in this case will mean that the length of result for that 
page will be shorter than the number of data pages? So, is it possible that 
rather than flatten we instead want to do something like a flat map where the 
Some values are flattened and None values are mapped to a null value?
   
   I think you are correct -- that is a very insightful conclusion @efredine 
   
   Ideally what I think we should do is to write up a test case (using your 
suggestion of a column / page that is entirely null) and verify there is a 
problem / fix it.
   
   Is this something you are willing to do? I filed 
https://github.com/apache/datafusion/issues/11280 to track


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to