This is an automated email from the ASF dual-hosted git repository.
danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 3235c366f84 [HUDI-6089][DOCS] update default value of
hoodie.merge.allow.duplicate.on.inserts to true (#10739)
3235c366f84 is described below
commit 3235c366f846adf1498b65efaf469a65a13b035a
Author: wombatu-kun <[email protected]>
AuthorDate: Sat Feb 24 11:25:20 2024 +0700
[HUDI-6089][DOCS] update default value of
hoodie.merge.allow.duplicate.on.inserts to true (#10739)
Co-authored-by: Vova Kolmakov <[email protected]>
---
website/docs/configurations.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index 18c3581e305..e52f0a52a75 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -831,7 +831,7 @@ Configurations that control write behavior on Hudi tables.
These can be directly
| [hoodie.markers.delete.parallelism](#hoodiemarkersdeleteparallelism)
| 100
| Determines the parallelism for deleting
marker files, which are used to track all files (valid or invalid/partial)
written during a write operation. Increase this value if delays are observed,
with large batch writes.<br />`Config Param: MARKERS_DELETE_PARALLELISM_VALUE`
[...]
|
[hoodie.markers.timeline_server_based.batch.interval_ms](#hoodiemarkerstimeline_server_basedbatchinterval_ms)
| 50
| The batch interval in milliseconds for marker creation batch
processing<br />`Config Param:
MARKERS_TIMELINE_SERVER_BASED_BATCH_INTERVAL_MS`<br />`Since Version: 0.9.0`
[...]
|
[hoodie.markers.timeline_server_based.batch.num_threads](#hoodiemarkerstimeline_server_basedbatchnum_threads)
| 20
| Number of threads to use for batch processing marker creation requests at
the timeline server<br />`Config Param:
MARKERS_TIMELINE_SERVER_BASED_BATCH_NUM_THREADS`<br />`Since Version: 0.9.0`
[...]
-|
[hoodie.merge.allow.duplicate.on.inserts](#hoodiemergeallowduplicateoninserts)
| false
| When enabled, we allow duplicate keys even
if inserts are routed to merge with an existing file (for ensuring file
sizing). This is only relevant for insert operation, since upsert, delete
operations will ensure unique key constraints are maintained.<br />`Config
Param: MERGE_ALLOW_DUPLICATE_ON [...]
+|
[hoodie.merge.allow.duplicate.on.inserts](#hoodiemergeallowduplicateoninserts)
| true
| When enabled, we allow duplicate keys even
if inserts are routed to merge with an existing file (for ensuring file
sizing). This is only relevant for insert operation, since upsert, delete
operations will ensure unique key constraints are maintained.<br />`Config
Param: MERGE_ALLOW_DUPLICATE_ON [...]
| [hoodie.merge.data.validation.enabled](#hoodiemergedatavalidationenabled)
| false
| When enabled, data validation checks are
performed during merges to ensure expected number of records after merge
operation.<br />`Config Param: MERGE_DATA_VALIDATION_CHECK_ENABLE`
[...]
|
[hoodie.merge.small.file.group.candidates.limit](#hoodiemergesmallfilegroupcandidateslimit)
| 1
| Limits number of file groups, whose base file satisfies
small-file limit, to consider for appending records during upsert operation.
Only applicable to MOR tables<br />`Config Param:
MERGE_SMALL_FILE_GROUP_CANDIDATES_LIMIT`
[...]
|
[hoodie.release.resource.on.completion.enable](#hoodiereleaseresourceoncompletionenable)
| true
| Control to enable release all persist rdds when the
spark job finish.<br />`Config Param: RELEASE_RESOURCE_ENABLE`<br />`Since
Version: 0.11.0`
[...]