[
https://issues.apache.org/jira/browse/SPARK-50656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17910081#comment-17910081
]
Narayan Bhawar commented on SPARK-50656:
----------------------------------------
Any updates on this
> JDBC Reader Fails to Handle Complex Types (Array, Map) from Trino
> -----------------------------------------------------------------
>
> Key: SPARK-50656
> URL: https://issues.apache.org/jira/browse/SPARK-50656
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, Spark Shell
> Affects Versions: 3.5.4
> Environment: {*}Environment{*}:
> * Spark version: 3.5.x
> * Trino version: 457
> * JDBC Driver: {{io.trino.jdbc.TrinoDriver}}
> Reporter: Narayan Bhawar
> Priority: Major
> Labels: Trino, complextype, jdbc, spark
>
> {*}Description{*}:
> I am encountering an issue when using Spark to read data from a Trino
> instance via JDBC. Specifically, when querying complex types such as
> {{ARRAY}} or {{MAP}} from Trino, Spark throws an error indicating that it
> cannot recognize these SQL types. Below is the context:
> {*}Code Example{*}:
>
> {code:java}
> val sourceDF = spark.read
> .format("jdbc")
> .option("driver", "io.trino.jdbc.TrinoDriver")
> .option("url", "jdbc:trino://localhost:8181")
> .option("query", "select address from minio.qa.nbcheck1")
> .load(){code}
>
>
> *Error Message:*
>
> {code:java}
> 2/04 03:49:59 INFO SparkContext: SparkContext already stopped.
> Exception in thread "main" org.apache.spark.SparkSQLException:
> [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: array (row(city
> varchar, state varchar)), id: ARRAY.
> at
> org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:992){code}
>
>
> {*}Root Cause{*}:
> The error seems to be occurring because Spark's JDBC data source does not
> recognize complex SQL types like {{ARRAY}} or {{MAP}} from Trino by default.
> This is confirmed by the following relevant section of Spark's code:
> [https://github.com/apache/spark/blob/v3.5.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala]
> {code:java}
> private def getCatalystType(
> sqlType: Int,
> typeName: String,
> precision: Int,
> scale: Int,
> signed: Boolean,
> isTimestampNTZ: Boolean): DataType = sqlType match {
> ...
> case _ =>
> // For unmatched types:
> // including java.sql.Types.ARRAY, DATALINK, DISTINCT, JAVA_OBJECT,
> NULL, OTHER, REF_CURSOR,
> // TIME_WITH_TIMEZONE, TIMESTAMP_WITH_TIMEZONE, and others.
> val jdbcType = classOf[JDBCType].getEnumConstants()
> .find(_.getVendorTypeNumber == sqlType)
> .map(_.getName)
> .getOrElse(sqlType.toString)
> throw QueryExecutionErrors.unrecognizedSqlTypeError(jdbcType,
> typeName){code}
> As you can see, the method for translating JDBC types to Spark Catalyst types
> doesn't currently handle ARRAY or MAP, among other types, leading to the
> error. The JDBC schema translation fails when complex types such as ARRAY or
> MAP are present.
> {*}Expected Behavior{*}:
> Spark should not fail when encountering complex types like {{ARRAY}} or
> {{MAP}} from a Trino JDBC source. Instead, it should either:
> # Convert these complex types into a serialized string format (e.g., JSON)
> over the wire.
> # Provide an option for users to manually handle such complex types after
> loading them into a DataFrame.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]