gatesn commented on issue #2581: URL: https://github.com/apache/datafusion/issues/2581#issuecomment-2574525606
Apologies, I should have checked the example value. 10_000 shows what I mean: ``` explain select x = cast(10000 AS int) from '/tmp/foo.parquet'; +---------------+---------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------------------------------------------------------------+ | logical_plan | Projection: CAST(/tmp/foo.parquet.x AS Int32) = Int32(10000) AS /tmp/foo.parquet.x = Int64(10000) | | | TableScan: /tmp/foo.parquet projection=[x] | | physical_plan | ProjectionExec: expr=[CAST(x@0 AS Int32) = 10000 as /tmp/foo.parquet.x = Int64(10000)] | | | ParquetExec: file_groups={1 group: [[tmp/foo.parquet]]}, projection=[x] | | | | +---------------+---------------------------------------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.004 seconds. ``` A side note, but perhaps we're missing a rule somewhere to know that x can never `= 10000` when it started out as a u8? Perhaps my change in #13736 that preserves min/max stats through cast expressions? But we can see in the physical plan the DataFusion cast from `x` to `Int32`, even though x is stored as an Int32 inside Parquet, is read back into an Int32 Arrow array, and down-casted to an Int8 arrow array, all before being returned to DataFusion to be cast back up to Int32. Admittedly, this could be solved by providing a "target type" in the projection mask, short of full generic projection expression push-down. But it remains interesting that many file formats have the ability to optimize some subset of projection expressions. Even the Parquet reader could push-down projection expressions over dictionary values prior to a full dictionary decode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org