Re: [PR] [SPARK-50525][SQL] Prohibit partitioning by MapType [spark]

via GitHub Fri, 13 Dec 2024 03:17:27 -0800


MaxGekk commented on code in PR #49144:
URL: https://github.com/apache/spark/pull/49144#discussion_r1883773758



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -885,6 +892,14 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
                 "expr" -> variantExpr.sql,
                 "dataType" -> toSQLType(variantExpr.dataType)))
 
+          case o if mapExprInPartitionExpression(o).isDefined =>
+            val mapExpr = mapExprInPartitionExpression(o).get
+            o.failAnalysis(
+              errorClass = "UNSUPPORTED_FEATURE.PARTITION_BY_MAP",
+              messageParameters = Map(
+                "expr" -> mapExpr.sql,

Review Comment:
   Please, quite to `toSQLExpr`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -859,7 +866,7 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
               summary = j.origin.context.summary)
 
           // TODO: although map type is not orderable, technically map type 
should be able to be
-          // used in equality comparison, remove this type check once we 
support it.
+          //   used in equality comparison, remove this type check once we 
support it.

Review Comment:
   unnecessary changes:
   ```suggestion
             // used in equality comparison, remove this type check once we 
support it.
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -371,6 +371,48 @@ class DataFrameSuite extends QueryTest
     }
   }
 
+  test("SPARK-50525 - cannot partition by map columns") {
+    val df = sql("select map(id, id) as m, id % 5 as id from range(0, 100, 1, 
5)")
+    // map column
+    checkError(
+      exception = intercept[AnalysisException](df.repartition(5, col("m"))),
+      condition = "UNSUPPORTED_FEATURE.PARTITION_BY_MAP",
+      parameters = Map(
+        "expr" -> "m",
+        "dataType" -> "\"MAP<BIGINT, BIGINT>\"")
+    )
+    // map producing expression
+    checkError(
+      exception = intercept[AnalysisException](df.repartition(5, 
map(col("id"), col("id")))),
+      condition = "UNSUPPORTED_FEATURE.PARTITION_BY_MAP",
+      parameters = Map(
+        "expr" -> "map(id, id)",
+        "dataType" -> "\"MAP<BIGINT, BIGINT>\"")
+    )
+    // Partitioning by non-map column works
+    try {
+      df.repartition(5, col("id")).collect()
+    } catch {
+      case e: Exception =>
+        fail(s"Expected no exception to be thrown but an exception was thrown: 
${e.getMessage}")
+    }

Review Comment:
   This is not needed since it is covered by other tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50525][SQL] Prohibit partitioning by MapType [spark]

Reply via email to