alamb commented on issue #10922: URL: https://github.com/apache/datafusion/issues/10922#issuecomment-2210679342
> But I'm wondering if flatten is the right thing to do here? > > The min or max values for each page will be None if all the values on the page happen to be null: https://github.com/apache/arrow-rs/blob/master/parquet/src/file/page_index/index.rs#L37-L44 > > Using flatten in this case will mean that the length of result for that page will be shorter than the number of data pages? So, is it possible that rather than flatten we instead want to do something like a flat map where the Some values are flattened and None values are mapped to a null value? I think you are correct -- that is a very insightful conclusion @efredine Ideally what I think we should do is to write up a test case (using your suggestion of a column / page that is entirely null) and verify there is a problem / fix it. Is this something you are willing to do? I filed https://github.com/apache/datafusion/issues/11280 to track -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
