Re: [PR] [SPARK-50124][SQL][FOLLOWUP] InsertSortForLimitAndOffset should propagate missing ordering columns [spark]

via GitHub Wed, 08 Jan 2025 20:31:15 -0800


JoshRosen commented on code in PR #49416:
URL: https://github.com/apache/spark/pull/49416#discussion_r1908150992



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffsetSuite.scala:
##########
@@ -110,11 +111,21 @@ class InsertSortForLimitAndOffsetSuite extends QueryTest
   }
 
   test("middle OFFSET preserves data ordering with the extra sort") {
-    val df = spark.range(10).orderBy($"id" % 8).offset(2).distinct()
-    df.collect()
-    val physicalPlan = df.queryExecution.executedPlan
+    val df = 1.to(10).map(v => v -> v).toDF("c1", "c2").orderBy($"c1" % 8)
+    verifySortAdded(df.offset(2))
+    verifySortAdded(df.filter($"c2" > rand()).offset(2))
+    verifySortAdded(df.select($"c2").offset(2))
+    verifySortAdded(df.filter($"c2" > rand()).select($"c2").offset(2))
+  }
+
+  private def verifySortAdded(df: Dataset[_]): Unit = {

Review Comment:
   Thinking through the test coverage with an eye towards [mutation 
testing](https://en.wikipedia.org/wiki/Mutation_testing), I wonder whether we 
also need to be asserting that the resulting output schema is the expected 
schema?
   
   This PR is adding new logic to project out additional columns that we've 
added (the `else` branch of the `if (sorted.output == s.output)` check) but I 
don't think this suite's current tests would fail if we inverted that check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50124][SQL][FOLLOWUP] InsertSortForLimitAndOffset should propagate missing ordering columns [spark]

Reply via email to