[
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557700#comment-13557700
]
Alan Gates commented on HIVE-896:
---------------------------------
bq. If I read this right you are using CLUSTER BY and SORT BY instead of
PARTITION BY and ORDER BY for syntax in OVER. Why? To highlight the
similarity. The Partition/Order specs in a Window clause have the same meaning
as Cluster/Distribute in HQL.
This is only true as long as you have only one OVER clause, right? As soon as
you add the ability to have separate OVER clauses partitioning by different
keys (which users will want very soon) you lose this identity.
Even if you decide to retain this I would argue that the standard PARTITION
BY/ORDER BY syntax should be accepted as well. HQL already has enough one off
syntax that makes life hard for people coming from more standard SQL. It
should not be exacerbated.
bq. Could you explain how the partition is handled in memory...
Partitions are backed by a Persistent List ( see
ptf.ds.PartitionedByteBasedList) . We need do to some work to refactor this
package. Yes you are right, things can be done in delaying bringing rows into a
partition and getting rid of rows once outside the window. This is true for
Windowing Table Function; especially for Range based Windows.
But for a general PTF the contract is Partition in Partition out. For e.g.
CandidateFrequency function will read the rows in a partition multiple times.
This is part of where I was going with my earlier question on why a windowing
function would ever return a partition. I am becoming less convinced that it
makes sense to combine windowing and partition functions. While they both take
partitions as inputs they return different things. Partition functions return
partitions and windowing functions return a single value. As you point out
here the partition functions will also not be interested in the range limiting
features of windowing functions. But taking advantage of this in windowing
functions will be very important for performance optimizations, I suspect. At
the very least it seems like partitioning functions and windowing functions
should be presented as separate entities to users and UDF writers, even if for
now Hive shares some of the framework for handling them underneath. This way
in the future optimizations and new features can be added in a way that is
advantageous for each.
> Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
> ---------------------------------------------------------------
>
> Key: HIVE-896
> URL: https://issues.apache.org/jira/browse/HIVE-896
> Project: Hive
> Issue Type: New Feature
> Components: OLAP, UDF
> Reporter: Amr Awadallah
> Priority: Minor
> Attachments: HIVE-896.1.patch.txt
>
>
> Windowing functions are very useful for click stream processing and similar
> time-series/sliding-window analytics.
> More details at:
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
> -- amr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira