beyond1920 commented on code in PR #10977: URL: https://github.com/apache/hudi/pull/10977#discussion_r1555693313
########## website/docs/sql_dml.md: ########## @@ -390,3 +390,70 @@ and `clean.async.enabled` options are used to disable the compaction and cleanin This is done to ensure that the compaction and cleaning services are not executed twice for the same table. +### Consistent hashing index (Experimental) + +We have introduced the Consistent Hashing Index since [0.13.0 release](/releases/release-0.13.0#consistent-hashing-index). In comparison to the static hashing index ([Bucket Index](/releases/release-0.11.0#bucket-index)), the consistent hashing index offers dynamic scalability of data buckets for the writer. +You can find the [RFC](https://github.com/apache/hudi/blob/master/rfc/rfc-42/rfc-42.md) for the design of this feature. +In the 0.13.X release, the Consistent Hashing Index is supported only for Spark engine. And since [release 0.14.0](/releases/release-0.14.0#consistent-hashing-index-support), the index is supported for Flink engine. + +In the below example, we have a streaming ingestion pipeline that written to the table with consistent bucket index. +To utilize this feature, configure the option `index.type` as `BUCKET` and set `hoodie.index.bucket.engine` to `CONSISTENT_HASHING`. +When enabling the consistent hashing index, it's important to enable clustering scheduling within the writer. During this process, the writer will perform dual writes for both the old and new data buckets while the clustering is pending. Although the dual write does not impact correctness, it is strongly recommended to execute clustering as quickly as possible. + +```sql +-- set the interval as 30 seconds +execution.checkpointing.interval: 30000 +state.backend: rocksdb Review Comment: No. In order to commit the dataset, the checkpoint needs to be enabled, here is an example configuration for a `flink-conf.yaml`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
