Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-24 Thread via GitHub
etseidl commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2828518601 Fixed by #15764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-24 Thread via GitHub
etseidl closed issue #15742: Support more types when pruning Parquet data URL: https://github.com/apache/datafusion/issues/15742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
etseidl commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816792398 I'll follow up in #15764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816758476 - While reviewing https://github.com/apache/datafusion/pull/15764 it wasn't clear to me why we are checking casting / types at all in the pruning code. I think that might

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816667473 Funny enough I just opened https://github.com/apache/datafusion/pull/15764 without having seen this issue! It sounds like there may be some complexity with floats... hone

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816659245 FYI @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816659212 > It would be nice if Datafusion always used statistics for floating point columns if they are available. One potential fix is to add more cases to verify_support_type_for_prune (

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-18 Thread via GitHub
etseidl commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2815595171 BTW, this issue is somewhat tied up with https://github.com/apache/parquet-format/pull/221. Take for example ```sql > select * from 'parquet-testing/data/float16_nonzeros_a

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-18 Thread via GitHub
etseidl commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2815571292 More background in #3377 and #3442. It seems like additional data types were planned, but abandoned for some reason. @alamb do you think it would be safe to replace the lo

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-17 Thread via GitHub
etseidl commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2814062585 After a bit of spelunking through the plan generation code, I figured out why my second suggestion doesn't make sense. Deep down the types will be coerced to a common type for c

[I] Support more types when pruning Parquet data [datafusion]

2025-04-16 Thread via GitHub
etseidl opened a new issue, #15742: URL: https://github.com/apache/datafusion/issues/15742 ### Is your feature request related to a problem or challenge? I've been working on implementing a new `ColumnOrder` for floating point columns in Parquet (https://github.com/apache/arrow-rs/pul