jayzhan211 commented on PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#issuecomment-2451402632
Tpch q14 doesn't seem to run through the change of this PR -- `groupBy` is
empty in `AggregateExec `, I also doesn't see any print out in
`VectorizedGroupValuesColumn`. I think this change is not the reason of
slowdown 🤔
```
query TT
explain select
100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
lineitem,
part
where
l_partkey = p_partkey
and l_shipdate >= date '1995-09-01'
and l_shipdate < date '1995-10-01';
----
logical_plan
01)Projection: Float64(100) * CAST(sum(CASE WHEN part.p_type LIKE
Utf8("PROMO%") THEN lineitem.l_extendedprice * Int64(1) - lineitem.l_discount
ELSE Int64(0) END) AS Float64) / CAST(sum(lineitem.l_extendedprice * Int64(1) -
lineitem.l_discount) AS Float64) AS promo_revenue
02)--Aggregate: groupBy=[[]], aggr=[[sum(CASE WHEN part.p_type LIKE
Utf8("PROMO%") THEN __common_expr_1 ELSE Decimal128(Some(0),38,4) END) AS
sum(CASE WHEN part.p_type LIKE Utf8("PROMO%") THEN lineitem.l_extendedprice *
Int64(1) - lineitem.l_discount ELSE Int64(0) END), sum(__common_expr_1) AS
sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)]]
03)----Projection: lineitem.l_extendedprice * (Decimal128(Some(1),20,0) -
lineitem.l_discount) AS __common_expr_1, part.p_type
04)------Inner Join: lineitem.l_partkey = part.p_partkey
05)--------Projection: lineitem.l_partkey, lineitem.l_extendedprice,
lineitem.l_discount
06)----------Filter: lineitem.l_shipdate >= Date32("1995-09-01") AND
lineitem.l_shipdate < Date32("1995-10-01")
07)------------TableScan: lineitem projection=[l_partkey, l_extendedprice,
l_discount, l_shipdate], partial_filters=[lineitem.l_shipdate >=
Date32("1995-09-01"), lineitem.l_shipdate < Date32("1995-10-01")]
08)--------TableScan: part projection=[p_partkey, p_type]
physical_plan
01)ProjectionExec: expr=[100 * CAST(sum(CASE WHEN part.p_type LIKE
Utf8("PROMO%") THEN lineitem.l_extendedprice * Int64(1) - lineitem.l_discount
ELSE Int64(0) END)@0 AS Float64) / CAST(sum(lineitem.l_extendedprice * Int64(1)
- lineitem.l_discount)@1 AS Float64) as promo_revenue]
02)--AggregateExec: mode=Final, gby=[], aggr=[sum(CASE WHEN part.p_type LIKE
Utf8("PROMO%") THEN lineitem.l_extendedprice * Int64(1) - lineitem.l_discount
ELSE Int64(0) END), sum(lineitem.l_extendedprice * Int64(1) -
lineitem.l_discount)]
03)----CoalescePartitionsExec
04)------AggregateExec: mode=Partial, gby=[], aggr=[sum(CASE WHEN
part.p_type LIKE Utf8("PROMO%") THEN lineitem.l_extendedprice * Int64(1) -
lineitem.l_discount ELSE Int64(0) END), sum(lineitem.l_extendedprice * Int64(1)
- lineitem.l_discount)]
05)--------ProjectionExec: expr=[l_extendedprice@0 * (Some(1),20,0 -
l_discount@1) as __common_expr_1, p_type@2 as p_type]
06)----------CoalesceBatchesExec: target_batch_size=8192
07)------------HashJoinExec: mode=Partitioned, join_type=Inner,
on=[(l_partkey@0, p_partkey@0)], projection=[l_extendedprice@1, l_discount@2,
p_type@4]
08)--------------CoalesceBatchesExec: target_batch_size=8192
09)----------------RepartitionExec: partitioning=Hash([l_partkey@0], 4),
input_partitions=4
10)------------------CoalesceBatchesExec: target_batch_size=8192
11)--------------------FilterExec: l_shipdate@3 >= 1995-09-01 AND
l_shipdate@3 < 1995-10-01, projection=[l_partkey@0, l_extendedprice@1,
l_discount@2]
12)----------------------CsvExec: file_groups={4 groups:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
projection=[l_partkey, l_extendedprice, l_discount, l_shipdate],
has_header=false
13)--------------CoalesceBatchesExec: target_batch_size=8192
14)----------------RepartitionExec: partitioning=Hash([p_partkey@0], 4),
input_partitions=4
15)------------------RepartitionExec: partitioning=RoundRobinBatch(4),
input_partitions=1
16)--------------------CsvExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]},
projection=[p_partkey, p_type], has_header=false
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]