Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/3703#issuecomment-67041844
Comments from the [review on
Reviewable.io](https://reviewable.io:443/reviews/apache/spark/3703)
---
<sup>**[sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala,
line 30
\[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdE5SkiLKM30za1keNC)**
([raw
file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala#L30)):</sup>
This can be problematic. Ideally every aggregation function that can be
used with window should have a `windowSpec: Option[WindowSpec]` field which
defaults to `None`, and a `withWindowSpec` method that returns a new instance
of the aggregation function object itself with a window spec.
---
<sup>**[sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala,
line 874
\[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdE660e137A3Dk7Uqp2)**
([raw
file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala#L874)):</sup>
The thread-local `windowDefs` map is used to store window definitions
(`w1`, `w2` and `w3`) in queries like this:
```sql
SELECT
p_mfgr, p_name, p_size,
SUM(p_size) OVER w1 AS s1,
SUM(p_size) OVER w2 AS s2,
SUM(p_size) OVER (w3 ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS s3
FROM
part
WINDOW
w1 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN 2 PRECEDING
AND 2 FOLLOWING),
w2 AS w3,
w3 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW)
```
This map is cleaned and refilled in `collectWindowDefs` below, so it
doesn't grow indefinitely.
---
<sup>**[sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala,
line 1060
\[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdEA7B6QXd6N0aDAEMh)**
([raw
file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala#L1060)):</sup>
All builtin aggregation functions need a similar `case` clause to handle
their windowed version. Otherwise they all fallback to Hive UDAF
implementations.
`COUNT` is picked here because its Hive version `GenericUDAFCount`
implements `GenericUDAFResolver2` rather than `AbstractGenericUDAFResolver`,
and is not handled by `HiveFunctionRegistry.lookupFunction`.
---
<!-- Sent from Reviewable.io -->
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]