sunxiaoguang commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1976368257
########## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ########## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest assert(rows10(0).getString(0) === "amy") assert(rows10(1).getString(0) === "alex") } + + // MySQL Connector/J uses collation 'utf8mb4_0900_ai_ci' as collation for connection. + // The MySQL server 9.1.0 uses collation 'utf8mb4_0900_ai_ci' for database by default. + // This method uses string colume directly as the result of cast has the same collation. + def testCastStringTarget(stringLiteral: String, stringCol: String): String = stringCol + + test("SPARK-50793: MySQL JDBC Connector failed to cast some types") { + val tableName = catalogName + ".test_cast_function" + withTable(tableName) { + val stringValue = "0" + val stringLiteral = "'0'" + val stringCol = "string_col" + val longValue = 0L + val longCol = "long_col" + val binaryValue = Array[Byte](0x30) + val binaryLiteral = "x'30'" + val binaryCol = "binary_col" + val doubleValue = 0.0 + val doubleLiteral = "0.0" + val doubleCol = "double_col" + // CREATE table to use types defined in Spark SQL + sql( + s"CREATE TABLE $tableName ($stringCol STRING, $longCol LONG, " + + s"$binaryCol BINARY, $doubleCol DOUBLE)") + sql( + s"INSERT INTO $tableName VALUES($stringLiteral, $longValue, $binaryLiteral, $doubleValue)") + + def testCast( + castType: String, + sourceCol: String, + targetCol: String, + targetDataType: DataType, + targetValue: Any): Unit = { + val sql = s"SELECT CAST($sourceCol AS $castType) AS target " + + s"FROM $tableName WHERE CAST($sourceCol AS $castType) = $targetCol" + val df = spark.sql(sql) Review Comment: > You just need supports pushdown the collation to H2 dialect as an example or select MySQL. Other dialects remains the followup pr. First, make the DS V2 pushdown framework supports collation and select MySQL or H2 as a basic implementation. Then continue proceeding this PR. Ok, I have some quick questions about this. - I assume features like this should go through a RFC procedure. I will try to figure it out myself. Meanwhile, I would really appreciate if you can give me some hints of previous work similar as references - The subtle differences of different collations and encodings can be very tricky, what's your suggestion for tables with collations that we know that Spark don't support yet. How do we work with collations that are a little different but overall behave the same. As a related question, different version of MySQL server support different set of collations. Shell we support the latest MySQL server only or we need to consider the version of MySQL server and try use the collation in that MySQL version. Really appreciate your help, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org