gatesn commented on issue #2581:
URL: https://github.com/apache/datafusion/issues/2581#issuecomment-2574525606

   Apologies, I should have checked the example value. 10_000 shows what I mean:
   
   ```
   explain select x = cast(10000 AS int) from '/tmp/foo.parquet';
   
+---------------+---------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                       |
   
+---------------+---------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: CAST(/tmp/foo.parquet.x AS Int32) = 
Int32(10000) AS /tmp/foo.parquet.x = Int64(10000) |
   |               |   TableScan: /tmp/foo.parquet projection=[x]               
                                       |
   | physical_plan | ProjectionExec: expr=[CAST(x@0 AS Int32) = 10000 as 
/tmp/foo.parquet.x = Int64(10000)]            |
   |               |   ParquetExec: file_groups={1 group: [[tmp/foo.parquet]]}, 
projection=[x]                         |
   |               |                                                            
                                       |
   
+---------------+---------------------------------------------------------------------------------------------------+
   2 row(s) fetched.
   Elapsed 0.004 seconds.
   ```
   
   A side note, but perhaps we're missing a rule somewhere to know that x can 
never `= 10000` when it started out as a u8? Perhaps my change in #13736 that 
preserves min/max stats through cast expressions?
   
   But we can see in the physical plan the DataFusion cast from `x` to `Int32`, 
even though x is stored as an Int32 inside Parquet, is read back into an Int32 
Arrow array, and down-casted to an Int8 arrow array, all before being returned 
to DataFusion to be cast back up to Int32.
   
   Admittedly, this could be solved by providing a "target type" in the 
projection mask, short of full generic projection expression push-down. But it 
remains interesting that many file formats have the ability to optimize some 
subset of projection expressions. Even the Parquet reader could push-down 
projection expressions over dictionary values prior to a full dictionary decode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to