huangxiaopingRD commented on code in PR #11869:
URL: https://github.com/apache/hudi/pull/11869#discussion_r1768273441
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/CreateHoodieTableCommand.scala:
##########
@@ -88,6 +92,20 @@ case class CreateHoodieTableCommand(table: CatalogTable,
ignoreIfExists: Boolean
object CreateHoodieTableCommand {
+ def validateTableSchema(userDefinedSchema: StructType, hoodieTableSchema:
StructType): Boolean = {
+ if (userDefinedSchema.fields.length != 0 &&
Review Comment:
@yihua @danny0405
Let me re-describe this problem:
**Scenario:**
1. The user creates a hudi table `table_xxx`, which has three columns ` (id
string, name string, list string)`
2. The user finds that the type of the `list` column should be `map<string,
string>` instead of `string`, so he executes the drop table operation,
intending to delete the table and recreate it, but the table executed is `drop
table table_xxx`, not `drop table table_xxx purge`. **Without `purge`
operation, `.hoodie` directory will not be deleted.**
3. The user re-creates the hudi table `table_xxx`, and the three columns
written by the user are `(id string, name string, list map<string, string>)`,
but the columns synchronized to HMS are still `(id string, name string, list
string)`, because the schema synchronized to HMS is the schema in the `.hoodie`
directory, which is the old schema, that is `(id string, name string, list
string)`
4. The user writes data to the table` table_xxx`. Because the written data
is inconsistent with the schema, SQL parsing fails, and the exception is
`org.apache.spark.sql.AnalysisException: Cannot write incompatible data to
table 'table_xxx'`
**PR purpose:**
When the schema in the `create table` statement is inconsistent with the
schema of the existing `.hoodie` directory, it is forbidden to create a new
table and an exception is thrown.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]