Herman van Hovell created SPARK-18604:
-----------------------------------------

             Summary: Collapse Window optimizer rule changes column order
                 Key: SPARK-18604
                 URL: https://issues.apache.org/jira/browse/SPARK-18604
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Herman van Hovell


The recently added CollapseWindow optimizer rule changes the column order of 
attributes. This actually modifies the schema of the logical plan (which 
optimization should not do), and breaks `collect()` in a subtle way (we bind 
the row encoder to the output of the logical plan and not the optimized plan). 

For example the following code:
{noformat}
val customers = Seq(
  ("Alice", "2016-05-01", 50.00),
  ("Alice", "2016-05-03", 45.00),
  ("Alice", "2016-05-04", 55.00),
  ("Bob", "2016-05-01", 25.00),
  ("Bob", "2016-05-04", 29.00),
  ("Bob", "2016-05-06", 27.00)).
  toDF("name", "date", "amountSpent")
 
// Import the window functions.
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
 
// Create a window spec.
val wSpec1 = Window.partitionBy("name").orderBy("date").rowsBetween(-1, 1)
val df2 = customers
  .withColumn("total", sum(customers("amountSpent")).over(wSpec1))
  .withColumn("cnt", count(customers("amountSpent")).over(wSpec1))
{noformat}
...yields the following weird result:
{noformat}
+-----+----------+-----------+--------+-------------------+
| name|      date|amountSpent|   total|                cnt|
+-----+----------+-----------+--------+-------------------+
|  Bob|2016-05-01|       25.0|1.0E-323|4632796641680687104|
|  Bob|2016-05-04|       29.0|1.5E-323|4635400285215260672|
|  Bob|2016-05-06|       27.0|1.0E-323|4633078116657397760|
|Alice|2016-05-01|       50.0|1.0E-323|4636385447633747968|
|Alice|2016-05-03|       45.0|1.5E-323|4639481672377565184|
|Alice|2016-05-04|       55.0|1.0E-323|4636737291354636288|
+-----+----------+-----------+--------+-------------------+
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to