[jira] [Commented] (HUDI-8853) Spark sql ALTER TABLE queries are failing on EMR

Mansi Patel (Jira) Thu, 09 Jan 2025 15:28:04 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911702#comment-17911702
 ]


Mansi Patel commented on HUDI-8853:
-----------------------------------

ALTER COLUMN is also causing issue.
{code:java}
spark.sql("ALTER TABLE mansipp_hudi_fgac_table3 ALTER COLUMN id TYPE string");
org.apache.spark.sql.AnalysisException: [NOT_SUPPORTED_CHANGE_COLUMN] ALTER 
TABLE ALTER/CHANGE COLUMN is not supported for changing 
`spark_catalog`.`default`.`mansipp_hudi_fgac_table3`'s column `id` with type 
"INT" to `id` with type "STRING". {code}
According to this table we should be able convert "int -> string".
[https://hudi.apache.org/docs/next/schema_evolution/#:~:text=DROP%20NOT%20NULL-,column%20type%20change,-Source%5CTarget]

Reproduction steps:

 
{code:java}
import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.functions._
import org.apache.hudi.DataSourceWriteOptions
import org.apache.hudi.DataSourceReadOptions
import org.apache.hudi.config.HoodieWriteConfig
import org.apache.hudi.hive.MultiPartKeysValueExtractor
import org.apache.hudi.hive.HiveSyncConfig
import org.apache.hudi.sync.common.HoodieSyncConfig
// Create a DataFrame
val inputDF = Seq(
 (100, "2015-01-01", "2015-01-01T13:51:39.340396Z"),
 (101, "2015-01-01", "2015-01-01T12:14:58.597216Z"),
 (102, "2015-01-01", "2015-01-01T13:51:40.417052Z"),
 (103, "2015-01-01", "2015-01-01T13:51:40.519832Z"),
 (104, "2015-01-02", "2015-01-01T12:15:00.512679Z"),
 (105, "2015-01-02", "2015-01-01T13:51:42.248818Z")
 ).toDF("id", "creation_date", "last_update_time")
//Specify common DataSourceWriteOptions in the single hudiOptions variable 
val hudiOptions = Map[String,String](
  HoodieWriteConfig.TBL_NAME.key -> "mansipp_hudi_fgac_table3",
  DataSourceWriteOptions.TABLE_TYPE.key -> "COPY_ON_WRITE", 
  DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id",
  DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "creation_date",
  DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "last_update_time",
  DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true",
  DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> "mansipp_hudi_fgac_table3",
  DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "creation_date",
  HoodieSyncConfig.META_SYNC_PARTITION_EXTRACTOR_CLASS.key -> 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
  HoodieSyncConfig.META_SYNC_ENABLED.key -> "true",
  HiveSyncConfig.HIVE_SYNC_MODE.key -> "hms",
  HoodieSyncConfig.META_SYNC_TABLE_NAME.key -> "mansipp_hudi_fgac_table3",
  HoodieSyncConfig.META_SYNC_PARTITION_FIELDS.key -> "creation_date"
)
// Write the DataFrame as a Hudi dataset
(inputDF.write
    .format("hudi")
    .options(hudiOptions)
    .option(DataSourceWriteOptions.OPERATION_OPT_KEY,"insert")
    .option("hoodie.schema.on.read.enable","true")
    .mode(SaveMode.Overwrite)
    .save("s3://mansipp-emr-dev/hudi/mansipp_hudi_fgac_table3/"))
{code}
{code:java}
spark.sql("ALTER TABLE mansipp_hudi_fgac_table3 ALTER COLUMN id TYPE string"); 
{code}

> Spark sql ALTER TABLE queries are failing on EMR
> ------------------------------------------------
>
>                 Key: HUDI-8853
>                 URL: https://issues.apache.org/jira/browse/HUDI-8853
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark-sql
>    Affects Versions: 0.15.0
>            Reporter: Mansi Patel
>            Priority: Major
>             Fix For: 1.0.1
>
>
> Some of the spark sql DDL queries are failing on EMR. Failed queries are 
> listed here
> 1. ALTER TABLE DROP COLUMN
> 2. ALTER TABLE REPLACE COLUMN
> 3. ALTER TABLE RENAME COLUMN
> {code:java}
> scala> spark.sql("ALTER TABLE mansipp_hudi_fgac_table DROP COLUMN 
> creation_date"); org.apache.spark.sql.AnalysisException: 
> [UNSUPPORTED_FEATURE.TABLE_OPERATION] The feature is not supported: Table 
> `spark_catalog`.`default`.`mansipp_hudi_fgac_table` does not support DROP 
> COLUMN. Please check the current catalog and namespace to make sure the 
> qualified table name is expected, and also check the catalog implementation 
> which is configured by "spark.sql.catalog". at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedTableOperationError(QueryCompilationErrors.scala:847)
>  at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedTableOperationError(QueryCompilationErrors.scala:837)
>  at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:110)
> {code}
> {code:java}
> scala> spark.sql("ALTER TABLE mansipp_hudi_fgac_table REPLACE COLUMNS (id 
> int, name varchar(10), city string)");
> org.apache.spark.sql.AnalysisException: [UNSUPPORTED_FEATURE.TABLE_OPERATION] 
> The feature is not supported: Table 
> `spark_catalog`.`default`.`mansipp_hudi_fgac_table` does not support REPLACE 
> COLUMNS. Please check the current catalog and namespace to make sure the 
> qualified table name is expected, and also check the catalog implementation 
> which is configured by "spark.sql.catalog".
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedTableOperationError(QueryCompilationErrors.scala:847)
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedTableOperationError(QueryCompilationErrors.scala:837)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:66)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:52)
>  {code}
> {code:java}
> scala> spark.sql("ALTER TABLE mansipp_hudi_fgac_table RENAME COLUMN 
> creation_date TO creation_date_renamed"); 25/01/09 00:38:42 WARN HiveConf: 
> HiveConf of name hive.server2.thrift.url does not exist SLF4J: Failed to load 
> class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation 
> (NOP) logger implementation SLF4J: See 
> http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 
> org.apache.spark.sql.AnalysisException: [UNSUPPORTED_FEATURE.TABLE_OPERATION] 
> The feature is not supported: Table 
> `spark_catalog`.`default`.`mansipp_hudi_fgac_table` does not support RENAME 
> COLUMN. Please check the current catalog and namespace to make sure the 
> qualified table name is expected, and also check the catalog implementation 
> which is configured by "spark.sql.catalog". at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedTableOperationError(QueryCompilationErrors.scala:847)
>  at org.apache.spark.sql.errors.QueryCompila
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-8853) Spark sql ALTER TABLE queries are failing on EMR

Reply via email to