linliu-code commented on code in PR #13950:
URL: https://github.com/apache/hudi/pull/13950#discussion_r2384180369
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -618,10 +618,20 @@ def testBulkInsertForDropPartitionColumn(): Unit = {
.setPartitionFields(fooTableParams(DataSourceWriteOptions.PARTITIONPATH_FIELD.key))
.setKeyGeneratorClassProp(fooTableParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key,
DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue()))
- if(addBootstrapPath) {
- tableMetaClientBuilder
-
.setBootstrapBasePath(fooTableParams(HoodieBootstrapConfig.BASE_PATH.key))
- }
+ if
(fooTableParams.contains(HoodieWriteConfig.WRITE_PAYLOAD_CLASS_NAME.key())) {
Review Comment:
I removed the settings and compared the master / this branch behavior. My
findings confirmed my above guess:
1. The default merge mode is Commit time ordering without any merge mode
configs set. This is the case in the master branch here. This made the table
config uses `commit_time_ordering` merge mode/strategy id.
2. Some test cases set `DefaultHoodieRecordPayload` payload class in the
configs. Then we have merge mode inconsistency: Triple(commit_time merge mode,
DefaultHoodieRecordPayload, commit_time id)
3. During the write in master branch,
`HoodieSparkSqlWriter.mergeParamsAndGetHoodieConfig` function did not trigger
merge config inference, and the write succeeds.
Here after we allow `strategy id` to be nullable, the merge mode
inconsistency becomes: `Triple(commit_time merge mode,
DefaultHoodieRecordPayload, null)`, which triggers the inference in
`HoodieSparkSqlWriter.mergeParamsAndGetHoodieConfig`, and throws errors in some
test cases.
Therefore, we need to pass any these configs to the metaclient to avoid such
inconsistency at the first place. We probably need to fix
`HoodieSparkSqlWriter.mergeParamsAndGetHoodieConfig` logic in the end. I did
not touch it in this PR to avoid more complexity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]