[GitHub] [hudi] YannByron commented on a change in pull request #3693: [HUDI-2456] support 'show partitions' sql

GitBox Thu, 23 Sep 2021 07:22:47 -0700


YannByron commented on a change in pull request #3693:
URL: https://github.com/apache/hudi/pull/3693#discussion_r714849166




##########
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/HoodieSparkFunSuite.scala
##########
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import java.io.PrintWriter
+import java.nio.charset.StandardCharsets.UTF_8
+import java.util.TimeZone
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.plans.logical
+import org.apache.spark.sql.catalyst.util.sideBySide
+import org.apache.spark.util.Utils
+import org.scalatest.FunSuite
+
+/**
+ * This code is mainly copy from Spark 2.X (org.apache.spark.sql.QueryTest).
+ */
+trait HoodieSparkFunSuite extends FunSuite {
+
+  /**
+   * Runs the plan and makes sure the answer matches the expected result.
+   *
+   * @param df the [[DataFrame]] to be executed
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   */
+  protected def checkAnswer(df: => DataFrame, expectedAnswer: Seq[Row]): Unit 
= {
+   val analyzedDF = try df catch {
+    case ae: AnalysisException =>
+     if (ae.plan.isDefined) fail(
+      s"""
+         |Failed to analyze query: $ae
+         |${ae.plan.get}
+         |
+               |${stackTraceToString(ae)}
+         |""".stripMargin) else throw ae
+   }
+
+   assertEmptyMissingInput(analyzedDF)
+
+   SparkFunSuite.checkAnswer(analyzedDF, expectedAnswer) match {
+    case Some(errorMessage) => fail(errorMessage)
+    case None =>
+   }
+  }
+
+  def stackTraceToString(t: Throwable): String = {
+   val out = new java.io.ByteArrayOutputStream
+   Utils.tryWithResource(new PrintWriter(out)) { writer =>
+    t.printStackTrace(writer)
+    writer.flush()
+   }
+   new String(out.toByteArray, UTF_8)
+  }
+
+  /**
+   * Asserts that a given [[Dataset]] does not have missing inputs in all the 
analyzed plans.
+   */
+  def assertEmptyMissingInput(query: Dataset[_]): Unit = {
+   assert(query.queryExecution.analyzed.missingInput.isEmpty,
+    s"The analyzed logical plan has missing 
inputs:\n${query.queryExecution.analyzed}")
+   assert(query.queryExecution.optimizedPlan.missingInput.isEmpty,
+    s"The optimized logical plan has missing 
inputs:\n${query.queryExecution.optimizedPlan}")
+   assert(query.queryExecution.executedPlan.missingInput.isEmpty,
+    s"The physical plan has missing 
inputs:\n${query.queryExecution.executedPlan}")
+  }
+}
+
+object SparkFunSuite {
+  /**
+   * Runs the plan and makes sure the answer matches the expected result.
+   * If there was exception during the execution or the contents of the 
DataFrame does not
+   * match the expected result, an error message will be returned. Otherwise, 
a [[None]] will
+   * be returned.
+   *HoodieSpark2ExtendedSqlParser.scala
+   * @param df the [[DataFrame]] to be executed
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   * @param checkToRDD whether to verify deserialization to an RDD. This runs 
the query twice.
+   */
+  def checkAnswer(

Review comment:
       `checkAnswer` in `TestHoodieSqlBase` doesn't work for this unit tests. 
If two sequences  have the same elements, but  not in the same order, it is 
judged to be different.
   `checkAnswer` method in `QueryTest` will sort the two sequences and then 
compare them.
   
   And The implements's inconsistent between Spark2 and Spark3 is the reason to 
copy codes rather than extend `QueryTest`. So I copy some codes needed here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] YannByron commented on a change in pull request #3693: [HUDI-2456] support 'show partitions' sql

Reply via email to