JoshRosen commented on code in PR #49416: URL: https://github.com/apache/spark/pull/49416#discussion_r1908150992
########## sql/core/src/test/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffsetSuite.scala: ########## @@ -110,11 +111,21 @@ class InsertSortForLimitAndOffsetSuite extends QueryTest } test("middle OFFSET preserves data ordering with the extra sort") { - val df = spark.range(10).orderBy($"id" % 8).offset(2).distinct() - df.collect() - val physicalPlan = df.queryExecution.executedPlan + val df = 1.to(10).map(v => v -> v).toDF("c1", "c2").orderBy($"c1" % 8) + verifySortAdded(df.offset(2)) + verifySortAdded(df.filter($"c2" > rand()).offset(2)) + verifySortAdded(df.select($"c2").offset(2)) + verifySortAdded(df.filter($"c2" > rand()).select($"c2").offset(2)) + } + + private def verifySortAdded(df: Dataset[_]): Unit = { Review Comment: Thinking through the test coverage with an eye towards [mutation testing](https://en.wikipedia.org/wiki/Mutation_testing), I wonder whether we also need to be asserting that the resulting output schema is the expected schema? This PR is adding new logic to project out additional columns that we've added (the `else` branch of the `if (sorted.output == s.output)` check) but I don't think this suite's current tests would fail if we inverted that check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org