alamb commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1929724090
########## datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part: ########## @@ -31,13 +31,13 @@ logical_plan 01)Projection: sum(lineitem.l_extendedprice * lineitem.l_discount) AS revenue 02)--Aggregate: groupBy=[[]], aggr=[[sum(lineitem.l_extendedprice * lineitem.l_discount)]] 03)----Projection: lineitem.l_extendedprice, lineitem.l_discount -04)------Filter: lineitem.l_shipdate >= Date32("1994-01-01") AND lineitem.l_shipdate < Date32("1995-01-01") AND lineitem.l_discount >= Decimal128(Some(5),15,2) AND lineitem.l_discount <= Decimal128(Some(7),15,2) AND lineitem.l_quantity < Decimal128(Some(2400),15,2) -05)--------TableScan: lineitem projection=[l_quantity, l_extendedprice, l_discount, l_shipdate], partial_filters=[lineitem.l_shipdate >= Date32("1994-01-01"), lineitem.l_shipdate < Date32("1995-01-01"), lineitem.l_discount >= Decimal128(Some(5),15,2), lineitem.l_discount <= Decimal128(Some(7),15,2), lineitem.l_quantity < Decimal128(Some(2400),15,2)] +04)------Filter: lineitem.l_shipdate >= Date32("1994-01-01") AND lineitem.l_shipdate < Date32("1995-01-01") AND CAST(lineitem.l_discount AS Float64) >= Float64(0.049999999999999996) AND CAST(lineitem.l_discount AS Float64) <= Float64(0.06999999999999999) AND lineitem.l_quantity < Decimal128(Some(2400),15,2) Review Comment: This will likely cause a performance regression as it will cast the entire `lineitem.l_discount` column to Float before comparison where previously it could compare to a constant. ########## datafusion/sqllogictest/test_files/tpch/plans/q11.slt.part: ########## @@ -49,7 +49,7 @@ limit 10; logical_plan 01)Sort: value DESC NULLS FIRST, fetch=10 02)--Projection: partsupp.ps_partkey, sum(partsupp.ps_supplycost * partsupp.ps_availqty) AS value -03)----Inner Join: Filter: CAST(sum(partsupp.ps_supplycost * partsupp.ps_availqty) AS Decimal128(38, 15)) > __scalar_sq_1.sum(partsupp.ps_supplycost * partsupp.ps_availqty) * Float64(0.0001) +03)----Inner Join: Filter: CAST(sum(partsupp.ps_supplycost * partsupp.ps_availqty) AS Float64) > __scalar_sq_1.sum(partsupp.ps_supplycost * partsupp.ps_availqty) * Float64(0.0001) Review Comment: I vaguely remember the use of Decimal here was important for TPCH results (maybe for correctness or something 🤔 ) ########## datafusion/core/tests/parquet/mod.rs: ########## @@ -184,7 +184,13 @@ impl TestOutput { /// and the appropriate scenario impl ContextWithParquet { async fn new(scenario: Scenario, unit: Unit) -> Self { - Self::with_config(scenario, unit, SessionConfig::new()).await + let mut session_config = SessionConfig::new(); + // TODO (https://github.com/apache/datafusion/issues/12817) once this is the default behavior, remove from here Review Comment: Does this means that DataFusion will no longer prune predicates like `decimal_col = 5.0`? If so, this like a significant regression / issue for anyone who relies on decimal types (like comet for example) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org