[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

liancheng Mon, 15 Dec 2014 10:36:33 -0800

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/3703#issuecomment-67041844
  
    Comments from the [review on 
Reviewable.io](https://reviewable.io:443/reviews/apache/spark/3703)
    
    
    
    
    
    
    
    ---
    
    
<sup>**[sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala,
 line 30 
\[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdE5SkiLKM30za1keNC)**
 ([raw 
file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala#L30)):</sup>
    This can be problematic. Ideally every aggregation function that can be 
used with window should have a `windowSpec: Option[WindowSpec]` field which 
defaults to `None`, and a `withWindowSpec` method that returns a new instance 
of the aggregation function object itself with a window spec.
    
    ---
    
    <sup>**[sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, 
line 874 
\[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdE660e137A3Dk7Uqp2)**
 ([raw 
file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala#L874)):</sup>
    The thread-local `windowDefs` map is used to store window definitions 
(`w1`, `w2` and `w3`) in queries like this:
    
    ```sql
    SELECT
        p_mfgr, p_name, p_size,
        SUM(p_size) OVER w1 AS s1,
        SUM(p_size) OVER w2 AS s2,
        SUM(p_size) OVER (w3 ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)  AS s3
    FROM
        part
    WINDOW
        w1 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN 2 PRECEDING 
AND 2 FOLLOWING),
        w2 AS w3,
        w3 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN UNBOUNDED 
PRECEDING AND CURRENT ROW)
    ```
    
    This map is cleaned and refilled in `collectWindowDefs` below, so it 
doesn't grow indefinitely.
    
    ---
    
    <sup>**[sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, 
line 1060 
\[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdEA7B6QXd6N0aDAEMh)**
 ([raw 
file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala#L1060)):</sup>
    All builtin aggregation functions need a similar `case` clause to handle 
their windowed version. Otherwise they all fallback to Hive UDAF 
implementations.
    
    `COUNT` is picked here because its Hive version `GenericUDAFCount` 
implements `GenericUDAFResolver2` rather than `AbstractGenericUDAFResolver`, 
and is not handled by `HiveFunctionRegistry.lookupFunction`.
    
    ---
    
    
    <!-- Sent from Reviewable.io -->




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

Reply via email to