Here is the original code and comments: https://github.com/apache/spark/commit/b6b50efc854f298d5b3e11c05dca995a85bec962#diff-4a8f00ca33a80744965463dcc6662c75L277
Seems this is intentional. Although I am not really sure why - maybe to match other SQL systems behavior? On Tue, Apr 3, 2018 at 5:09 PM, Reynold Xin <r...@databricks.com> wrote: > Seems like a bug. > > > > On Tue, Apr 3, 2018 at 1:26 PM, Li Jin <ice.xell...@gmail.com> wrote: > >> Hi Devs, >> >> I am seeing some behavior with window functions that is a bit unintuitive >> and would like to get some clarification. >> >> When using aggregation function with window, the frame boundary seems to >> change depending on the order of the window. >> >> Example: >> (1) >> >> df = spark.createDataFrame([[0, 1], [0, 2], [0, 3]]).toDF('id', 'v') >> >> w1 = Window.partitionBy('id') >> >> df.withColumn('v2', mean(df.v).over(w1)).show() >> >> +---+---+---+ >> >> | id| v| v2| >> >> +---+---+---+ >> >> | 0| 1|2.0| >> >> | 0| 2|2.0| >> >> | 0| 3|2.0| >> >> +---+---+---+ >> >> (2) >> df = spark.createDataFrame([[0, 1], [0, 2], [0, 3]]).toDF('id', 'v') >> >> w2 = Window.partitionBy('id').orderBy('v') >> >> df.withColumn('v2', mean(df.v).over(w2)).show() >> >> +---+---+---+ >> >> | id| v| v2| >> >> +---+---+---+ >> >> | 0| 1|1.0| >> >> | 0| 2|1.5| >> >> | 0| 3|2.0| >> >> +---+---+---+ >> >> Seems like orderBy('v') in the example (2) also changes the frame >> boundaries from ( >> >> unboundedPreceding, unboundedFollowing) to (unboundedPreceding, >> currentRow). >> >> >> I found this behavior a bit unintuitive. I wonder if this behavior is by >> design and if so, what's the specific rule that orderBy() interacts with >> frame boundaries? >> >> >> Thanks, >> >> Li >> >> >