[jira] [Commented] (SPARK-50656) JDBC Reader Fails to Handle Complex Types (Array, Map) from Trino

Narayan Bhawar (Jira) Mon, 06 Jan 2025 01:56:07 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-50656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17910081#comment-17910081
 ]


Narayan Bhawar commented on SPARK-50656:
----------------------------------------

Any updates on this

> JDBC Reader Fails to Handle Complex Types (Array, Map) from Trino
> -----------------------------------------------------------------
>
>                 Key: SPARK-50656
>                 URL: https://issues.apache.org/jira/browse/SPARK-50656
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Shell
>    Affects Versions: 3.5.4
>         Environment: {*}Environment{*}:
>  * Spark version: 3.5.x 
>  * Trino version: 457
>  * JDBC Driver: {{io.trino.jdbc.TrinoDriver}}
>            Reporter: Narayan Bhawar
>            Priority: Major
>              Labels: Trino, complextype, jdbc, spark
>
> {*}Description{*}:
> I am encountering an issue when using Spark to read data from a Trino 
> instance via JDBC. Specifically, when querying complex types such as 
> {{ARRAY}} or {{MAP}} from Trino, Spark throws an error indicating that it 
> cannot recognize these SQL types. Below is the context:
> {*}Code Example{*}:
>  
> {code:java}
> val sourceDF = spark.read
>   .format("jdbc")
>   .option("driver", "io.trino.jdbc.TrinoDriver")
>   .option("url", "jdbc:trino://localhost:8181")
>   .option("query", "select address from minio.qa.nbcheck1")
>   .load(){code}
>  
>  
> *Error Message:*
>  
> {code:java}
> 2/04 03:49:59 INFO SparkContext: SparkContext already stopped.
> Exception in thread "main" org.apache.spark.SparkSQLException: 
> [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: array (row(city 
> varchar, state varchar)), id: ARRAY.
> at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:992){code}
>  
>  
> {*}Root Cause{*}:
> The error seems to be occurring because Spark's JDBC data source does not 
> recognize complex SQL types like {{ARRAY}} or {{MAP}} from Trino by default. 
> This is confirmed by the following relevant section of Spark's code:
> [https://github.com/apache/spark/blob/v3.5.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala]
> {code:java}
> private def getCatalystType(
>   sqlType: Int,
>   typeName: String,
>   precision: Int,
>   scale: Int,
>   signed: Boolean,
>   isTimestampNTZ: Boolean): DataType = sqlType match {
>     ...
>     case _ =>
>       // For unmatched types:
>       // including java.sql.Types.ARRAY, DATALINK, DISTINCT, JAVA_OBJECT, 
> NULL, OTHER, REF_CURSOR,
>       // TIME_WITH_TIMEZONE, TIMESTAMP_WITH_TIMEZONE, and others.
>       val jdbcType = classOf[JDBCType].getEnumConstants()
>         .find(_.getVendorTypeNumber == sqlType)
>         .map(_.getName)
>         .getOrElse(sqlType.toString)
>       throw QueryExecutionErrors.unrecognizedSqlTypeError(jdbcType, 
> typeName){code}
> As you can see, the method for translating JDBC types to Spark Catalyst types 
> doesn't currently handle ARRAY or MAP, among other types, leading to the 
> error. The JDBC schema translation fails when complex types such as ARRAY or 
> MAP are present.
> {*}Expected Behavior{*}:
> Spark should not fail when encountering complex types like {{ARRAY}} or 
> {{MAP}} from a Trino JDBC source. Instead, it should either:
>  # Convert these complex types into a serialized string format (e.g., JSON) 
> over the wire.
>  # Provide an option for users to manually handle such complex types after 
> loading them into a DataFrame.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-50656) JDBC Reader Fails to Handle Complex Types (Array, Map) from Trino

Reply via email to