Re: Clarify window behavior in Spark SQL

2018-04-09 Thread Sandor Murakozi
Hi Li, You might find my pending PR useful: https://github.com/apache/spark/pull/20045/files It contains a big bunch of test cases covering the windowing functionality, showing and checking the behavior of a number of special cases. On Wed, Apr 4, 2018 at 4:26 AM, Reynold Xin wrote: > Thanks Li

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Reynold Xin
Thanks Li! On Tue, Apr 3, 2018 at 7:23 PM Li Jin wrote: > Thanks all for the explanation. I am happy to update the API doc. > > https://issues.apache.org/jira/browse/SPARK-23861 > > On Tue, Apr 3, 2018 at 8:54 PM, Reynold Xin wrote: > >> Ah ok. Thanks for commenting. Everyday I learn something

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Li Jin
Thanks all for the explanation. I am happy to update the API doc. https://issues.apache.org/jira/browse/SPARK-23861 On Tue, Apr 3, 2018 at 8:54 PM, Reynold Xin wrote: > Ah ok. Thanks for commenting. Everyday I learn something new about SQL. > > For others to follow, SQL Server has a good explan

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Reynold Xin
Ah ok. Thanks for commenting. Everyday I learn something new about SQL. For others to follow, SQL Server has a good explanation of the behavior: https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause- transact-sql Can somebody (Li?) update the API documentation to specify the gotc

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Xingbo Jiang
This is actually by design, without a `ORDER BY` clause, all rows are considered as the peer row of the current row, which means that the frame is effectively the entire partition. This behavior follows the window syntax of PGSQL. You can refer to the comment by yhuai: https://github.com/apache/spa

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Reynold Xin
Do other (non-Hive) SQL systems do the same thing? On Tue, Apr 3, 2018 at 3:16 PM, Herman van Hövell tot Westerflier < her...@databricks.com> wrote: > This is something we inherited from Hive: https://cwiki.apache. > org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics > > When ORDER

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Li Jin
Here is the original code and comments: https://github.com/apache/spark/commit/b6b50efc854f298d5b3e11c05dca995a85bec962#diff-4a8f00ca33a80744965463dcc6662c75L277 Seems this is intentional. Although I am not really sure why - maybe to match other SQL systems behavior? On Tue, Apr 3, 2018 at 5:09 P

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Reynold Xin
Seems like a bug. On Tue, Apr 3, 2018 at 1:26 PM, Li Jin wrote: > Hi Devs, > > I am seeing some behavior with window functions that is a bit unintuitive > and would like to get some clarification. > > When using aggregation function with window, the frame boundary seems to > change depending o

Clarify window behavior in Spark SQL

2018-04-03 Thread Li Jin
Hi Devs, I am seeing some behavior with window functions that is a bit unintuitive and would like to get some clarification. When using aggregation function with window, the frame boundary seems to change depending on the order of the window. Example: (1) df = spark.createDataFrame([[0, 1], [0,