Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

via GitHub Sat, 01 Mar 2025 04:04:11 -0800


sunxiaoguang commented on code in PR #49453:
URL: https://github.com/apache/spark/pull/49453#discussion_r1976368257



##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala:
##########
@@ -241,6 +241,84 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JDBCTest
     assert(rows10(0).getString(0) === "amy")
     assert(rows10(1).getString(0) === "alex")
   }
+
+  // MySQL Connector/J uses collation 'utf8mb4_0900_ai_ci' as collation for 
connection.
+  // The MySQL server 9.1.0 uses collation 'utf8mb4_0900_ai_ci' for database 
by default.
+  // This method uses string colume directly as the result of cast has the 
same collation.
+  def testCastStringTarget(stringLiteral: String, stringCol: String): String = 
stringCol
+
+  test("SPARK-50793: MySQL JDBC Connector failed to cast some types") {
+    val tableName = catalogName + ".test_cast_function"
+    withTable(tableName) {
+      val stringValue = "0"
+      val stringLiteral = "'0'"
+      val stringCol = "string_col"
+      val longValue = 0L
+      val longCol = "long_col"
+      val binaryValue = Array[Byte](0x30)
+      val binaryLiteral = "x'30'"
+      val binaryCol = "binary_col"
+      val doubleValue = 0.0
+      val doubleLiteral = "0.0"
+      val doubleCol = "double_col"
+      // CREATE table to use types defined in Spark SQL
+      sql(
+        s"CREATE TABLE $tableName ($stringCol STRING, $longCol LONG, " +
+          s"$binaryCol BINARY, $doubleCol DOUBLE)")
+      sql(
+        s"INSERT INTO $tableName VALUES($stringLiteral, $longValue, 
$binaryLiteral, $doubleValue)")
+
+      def testCast(
+          castType: String,
+          sourceCol: String,
+          targetCol: String,
+          targetDataType: DataType,
+          targetValue: Any): Unit = {
+        val sql = s"SELECT CAST($sourceCol AS $castType) AS target " +
+          s"FROM $tableName WHERE CAST($sourceCol AS $castType) = $targetCol"
+        val df = spark.sql(sql)

Review Comment:
   > You just need supports pushdown the collation to H2 dialect as an example 
or select MySQL. Other dialects remains the followup pr. First, make the DS V2 
pushdown framework supports collation and select MySQL or H2 as a basic 
implementation. Then continue proceeding this PR.
   
   Ok, I have some quick questions about this. 
   - I assume features like this should go through a RFC procedure. I will try 
to figure it out myself. Meanwhile, I would really appreciate if you can give 
me some hints of previous work similar as references
   - The subtle differences of different collations and encodings can be very 
tricky, what's your suggestion for tables with collations that we know that 
Spark don't support yet. How do we work with collations that are a little 
different but overall behave the same. As a related question, different version 
of MySQL server support different set of collations. Shell we support the 
latest MySQL server only or we need to consider the version of MySQL server and 
try use the collation in that MySQL version.
   
   Really appreciate your help, thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

Reply via email to