Re: [PR] [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files [spark]

via GitHub Wed, 18 Jun 2025 03:26:57 -0700


MaxGekk commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r2154235184



##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala:
##########
@@ -49,20 +50,51 @@ trait SQLQueryTestHelper extends Logging {
       .replaceAll("Created Time.*", s"Created Time $notIncludedMsg")
       .replaceAll("Last Access.*", s"Last Access $notIncludedMsg")
       .replaceAll("Partition Statistics\t\\d+", s"Partition 
Statistics\t$notIncludedMsg")
+      .replaceAll("CTERelationDef \\d+,", s"CTERelationDef xxxx,")
+      .replaceAll("CTERelationRef \\d+,", s"CTERelationRef xxxx,")
+      .replaceAll("@\\w*,", s"@xxxxxxxx,")
       .replaceAll("\\*\\(\\d+\\) ", "*") // remove the WholeStageCodegen 
codegenStageIds
   }
 
-
   /**
    * Analyzes a query and returns the result as (schema of the output, 
normalized resolved plan
    * tree string representation).
    */
   protected def getNormalizedQueryAnalysisResult(
       session: SparkSession, sql: String): (String, Seq[String]) = {
+    // Note that creating the following DataFrame includes eager execution for 
commands that create
+    // objects such as views. Therefore any following queries that reference 
these objects should
+    // find them in the catalog.
     val df = session.sql(sql)
     val schema = df.schema.catalogString
-    // Get the answer, but also get rid of the #1234 expression IDs that show 
up in analyzer plans.
-    (schema, Seq(replaceNotIncludedMsg(df.queryExecution.analyzed.toString)))
+    val analyzed = df.queryExecution.analyzed
+    // Determine if the analyzed plan contains any nondeterministic 
expressions.
+    var deterministic = true
+    analyzed.transformAllExpressionsWithSubqueries {
+      case expr: CurrentDate =>
+        deterministic = false
+        expr
+      case expr: CurrentTimestampLike =>
+        deterministic = false
+        expr
+      case expr: CurrentUser =>
+        deterministic = false
+        expr
+      case expr: Literal if expr.dataType == DateType || expr.dataType == 
TimestampType =>

Review Comment:
   Could you explain, please, why such literals of the types are not 
deterministic. Should I add TimeType (which is similar to DateType) to the `if` 
condition.



##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala:
##########
@@ -49,20 +50,51 @@ trait SQLQueryTestHelper extends Logging {
       .replaceAll("Created Time.*", s"Created Time $notIncludedMsg")
       .replaceAll("Last Access.*", s"Last Access $notIncludedMsg")
       .replaceAll("Partition Statistics\t\\d+", s"Partition 
Statistics\t$notIncludedMsg")
+      .replaceAll("CTERelationDef \\d+,", s"CTERelationDef xxxx,")
+      .replaceAll("CTERelationRef \\d+,", s"CTERelationRef xxxx,")
+      .replaceAll("@\\w*,", s"@xxxxxxxx,")
       .replaceAll("\\*\\(\\d+\\) ", "*") // remove the WholeStageCodegen 
codegenStageIds
   }
 
-
   /**
    * Analyzes a query and returns the result as (schema of the output, 
normalized resolved plan
    * tree string representation).
    */
   protected def getNormalizedQueryAnalysisResult(
       session: SparkSession, sql: String): (String, Seq[String]) = {
+    // Note that creating the following DataFrame includes eager execution for 
commands that create
+    // objects such as views. Therefore any following queries that reference 
these objects should
+    // find them in the catalog.
     val df = session.sql(sql)
     val schema = df.schema.catalogString
-    // Get the answer, but also get rid of the #1234 expression IDs that show 
up in analyzer plans.
-    (schema, Seq(replaceNotIncludedMsg(df.queryExecution.analyzed.toString)))
+    val analyzed = df.queryExecution.analyzed
+    // Determine if the analyzed plan contains any nondeterministic 
expressions.
+    var deterministic = true
+    analyzed.transformAllExpressionsWithSubqueries {
+      case expr: CurrentDate =>
+        deterministic = false
+        expr
+      case expr: CurrentTimestampLike =>
+        deterministic = false
+        expr
+      case expr: CurrentUser =>
+        deterministic = false
+        expr
+      case expr: Literal if expr.dataType == DateType || expr.dataType == 
TimestampType =>

Review Comment:
   Could you explain, please, why such literals of the types are not 
deterministic. Should I add TimeType (which is similar to DateType) to the `if` 
condition?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files [spark]

Reply via email to