[hudi] branch asf-site updated: [HUDI-1331] Updating configs from 0.10.1 (#4702)

sivabalan Thu, 27 Jan 2022 11:02:31 -0800

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new fd6cf83  [HUDI-1331] Updating configs from 0.10.1 (#4702)
fd6cf83 is described below

commit fd6cf831cb4934f6e442e33bf9e8f4a0e8e21ada
Author: Sivabalan Narayanan <[email protected]>
AuthorDate: Thu Jan 27 14:01:54 2022 -0500

    [HUDI-1331] Updating configs from 0.10.1 (#4702)
---
 website/docs/configurations.md | 383 ++++++++++++++++++++++++-----------------
 1 file changed, 226 insertions(+), 157 deletions(-)

diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index e2cb35b..b7a6aa7 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -4,7 +4,7 @@ keywords: [ configurations, default, flink options, spark, 
configs, parameters ]
 permalink: /docs/configurations.html
 summary: This page covers the different ways of configuring your job to 
write/read Hudi tables. At a high level, you can control behaviour at few 
levels.
 toc: true
-last_modified_at: 2021-12-08T17:24:42.348
+last_modified_at: 2022-01-27T12:11:53.356
 ---
 
 This page covers the different ways of configuring your job to write/read Hudi 
tables. At a high level, you can control behaviour at few levels.
@@ -17,12 +17,6 @@ This page covers the different ways of configuring your job 
to write/read Hudi t
 - [**Kafka Connect Configs**](#KAFKA_CONNECT): These set of configs are used 
for Kafka Connect Sink Connector for writing Hudi Tables
 - [**Amazon Web Services Configs**](#AWS): Please fill in the description for 
Config Group Name: Amazon Web Services Configs
 
-## Externalized Config File
-Instead of directly passing configuration settings to every Hudi job, you can 
also centrally set them in a configuration 
-file `hudi-default.conf`. By default, Hudi would load the configuration file 
under `/etc/hudi/conf` directory. You can 
-specify a different configuration directory location by setting the 
`HUDI_CONF_DIR` environment variable. This can be 
-useful for uniformly enforcing repeated configs (like Hive sync or write/index 
tuning), across your entire data lake. 
-
 ## Spark Datasource Configs {#SPARK_DATASOURCE}
 These configs control the Hudi Spark Datasource, providing ability to define 
keys/partitioning, pick out the write operation, specify how to merge records 
or choosing query type to read.
 
@@ -89,7 +83,7 @@ Options useful for reading tables via 
`read.format.option(...)`
 ---
 
 > #### hoodie.enable.data.skipping
-> enable data skipping to boost query after doing z-order optimize for current 
table<br></br>
+> Enables data-skipping allowing queries to leverage indexes to reduce the 
search space by skipping over files<br></br>
 > **Default Value**: true (Optional)<br></br>
 > `Config Param: ENABLE_DATA_SKIPPING`<br></br>
 > `Since Version: 0.10.0`<br></br>
@@ -203,6 +197,13 @@ the dot notation eg: `a.b.c`<br></br>
 
 ---
 
+> #### 
hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled
+> When set to true, consistent value will be generated for a logical timestamp 
type column, like timestamp-millis and timestamp-micros, irrespective of 
whether row-writer is enabled. Disabled by default so as not to break the 
pipeline that deploy either fully row-writer path or non row-writer path. For 
example, if it is kept disabled then record key of timestamp type with value 
`2016-12-29 09:54:00` will be written as timestamp `2016-12-29 09:54:00.0` in 
row-writer path, while it will be [...]
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED`<br></br>
+
+---
+
 > #### hoodie.datasource.hive_sync.support_timestamp
 > ‘INT64’ with original type TIMESTAMP_MICROS is converted to hive ‘timestamp’ 
 > type. Disabled by default for backward compatibility.<br></br>
 > **Default Value**: false (Optional)<br></br>
@@ -352,6 +353,13 @@ the dot notation eg: `a.b.c`<br></br>
 
 ---
 
+> #### hoodie.datasource.hive_sync.bucket_sync
+> Whether sync hive metastore bucket specification when using bucket index.The 
specification is 'CLUSTERED BY (trace_id) SORTED BY (trace_id ASC) INTO 65536 
BUCKETS'<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_SYNC_BUCKET_SYNC`<br></br>
+
+---
+
 > #### hoodie.datasource.hive_sync.auto_create_database
 > Auto create hive database if does not exists<br></br>
 > **Default Value**: true (Optional)<br></br>
@@ -445,6 +453,13 @@ By default false (the names of partition folders are only 
partition values)<br><
 
 ---
 
+> #### hoodie.datasource.hive_sync.conditional_sync
+> Enables conditional hive sync, where partition or schema change must exist 
to perform sync to hive.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: HIVE_CONDITIONAL_SYNC`<br></br>
+
+---
+
 > #### hoodie.datasource.hive_sync.mode
 > Mode to choose for Hive ops. Valid values are hms, jdbc and hiveql.<br></br>
 > **Default Value**: N/A (Required)<br></br>
@@ -1153,6 +1168,25 @@ Actual value will be obtained by invoking .toString() on 
the field value. Nested
 ## Write Client Configs {#WRITE_CLIENT}
 Internally, the Hudi datasource uses a RDD based HoodieWriteClient API to 
actually perform writes to storage. These configs provide deep control over 
lower level aspects like file sizing, compression, parallelism, compaction, 
write schema, cleaning etc. Although Hudi provides sane defaults, from 
time-time these configs may need to be tweaked to optimize for specific 
workloads.
 
+### Layout Configs {#Layout-Configs}
+
+Configurations that control storage layout and data distribution, which 
defines how the files are organized within a table.
+
+`Config Class`: org.apache.hudi.config.HoodieLayoutConfig<br></br>
+> #### hoodie.storage.layout.type
+> Type of storage layout. Possible options are [DEFAULT | BUCKET]<br></br>
+> **Default Value**: DEFAULT (Optional)<br></br>
+> `Config Param: LAYOUT_TYPE`<br></br>
+
+---
+
+> #### hoodie.storage.layout.partitioner.class
+> Partitioner class, it is used to distribute data in a specific way.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: LAYOUT_PARTITIONER_CLASS_NAME`<br></br>
+
+---
+
 ### Write commit callback configs {#Write-commit-callback-configs}
 
 Controls callback behavior into HTTP endpoints, to push  notifications on 
commits on hudi tables.
@@ -1282,6 +1316,13 @@ By default false (the names of partition folders are 
only partition values)<br><
 
 ---
 
+> #### hoodie.table.timeline.timezone
+> User can set hoodie commit timeline timezone, such as utc, local and so on. 
local is default<br></br>
+> **Default Value**: LOCAL (Optional)<br></br>
+> `Config Param: TIMELINE_TIMEZONE`<br></br>
+
+---
+
 > #### hoodie.table.version
 > Version of table, used for running upgrade/downgrade steps between releases 
 > with potentially breaking/backwards compatible changes.<br></br>
 > **Default Value**: ZERO (Optional)<br></br>
@@ -1303,6 +1344,13 @@ By default false (the names of partition folders are 
only partition values)<br><
 
 ---
 
+> #### hoodie.database.name
+> Database name that will be used for incremental query.If different databases 
have the same table name during incremental query, we can set it to limit the 
table name under a specific database<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: DATABASE_NAME`<br></br>
+
+---
+
 > #### hoodie.table.create.schema
 > Schema used when creating the table, for the first time.<br></br>
 > **Default Value**: N/A (Required)<br></br>
@@ -1404,7 +1452,7 @@ Configurations that control aspects around writing, 
sizing, reading base and log
 
 ---
 
-> #### hoodie.parquet.outputTimestampType
+> #### hoodie.parquet.outputtimestamptype
 > Sets spark.sql.parquet.outputTimestampType. Parquet timestamp type to use 
 > when Spark writes data to Parquet files.<br></br>
 > **Default Value**: TIMESTAMP_MILLIS (Optional)<br></br>
 > `Config Param: PARQUET_OUTPUT_TIMESTAMP_TYPE`<br></br>
@@ -1446,7 +1494,7 @@ Configurations that control aspects around writing, 
sizing, reading base and log
 
 ---
 
-> #### hoodie.parquet.writeLegacyFormat.enabled
+> #### hoodie.parquet.writelegacyformat.enabled
 > Sets spark.sql.parquet.writeLegacyFormat. If true, data will be written in a 
 > way of Spark 1.4 and earlier. For example, decimal values will be written in 
 > Parquet's fixed-length byte array format which other systems such as Apache 
 > Hive and Apache Impala use. If false, the newer format in Parquet will be 
 > used. For example, decimals will be written in int-based format.<br></br>
 > **Default Value**: false (Optional)<br></br>
 > `Config Param: PARQUET_WRITE_LEGACY_FORMAT_ENABLED`<br></br>
@@ -1584,6 +1632,14 @@ Configs that control DynamoDB based locking mechanisms 
required for concurrency
 
 ---
 
+> #### hoodie.write.lock.dynamodb.endpoint_url
+> For DynamoDB based lock provider, the url endpoint used for Amazon DynamoDB 
service. Useful for development with a local dynamodb instance.<br></br>
+> **Default Value**: us-east-1 (Optional)<br></br>
+> `Config Param: DYNAMODB_ENDPOINT_URL`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
 ### Metadata Configs {#Metadata-Configs}
 
 Configurations used by the Hudi Metadata Table. This table maintains the 
metadata about a given Hudi table (e.g file listings)  to avoid overhead of 
accessing cloud storage, during queries.
@@ -1655,7 +1711,7 @@ Configurations used by the Hudi Metadata Table. This 
table maintains the metadat
 
 > #### hoodie.metadata.enable
 > Enable the internal metadata table which serves table metadata like level 
 > file listings<br></br>
-> **Default Value**: false (Optional)<br></br>
+> **Default Value**: true (Optional)<br></br>
 > `Config Param: ENABLE`<br></br>
 > `Since Version: 0.7.0`<br></br>
 
@@ -2006,7 +2062,7 @@ Configurations that control write behavior on Hudi 
tables. These can be directly
 ---
 
 > #### hoodie.bulkinsert.sort.mode
-> Sorting modes to use for sorting records for bulk insert. This is user when 
user hoodie.bulkinsert.user.defined.partitioner.classis not configured. 
Available values are - GLOBAL_SORT: this ensures best file sizes, with lowest 
memory overhead at cost of sorting. PARTITION_SORT: Strikes a balance by only 
sorting within a partition, still keeping the memory overhead of writing lowest 
and best effort file sizing. NONE: No sorting. Fastest and matches 
`spark.write.parquet()` in terms of num [...]
+> Sorting modes to use for sorting records for bulk insert. This is use when 
user hoodie.bulkinsert.user.defined.partitioner.classis not configured. 
Available values are - GLOBAL_SORT: this ensures best file sizes, with lowest 
memory overhead at cost of sorting. PARTITION_SORT: Strikes a balance by only 
sorting within a partition, still keeping the memory overhead of writing lowest 
and best effort file sizing. NONE: No sorting. Fastest and matches 
`spark.write.parquet()` in terms of numb [...]
 > **Default Value**: GLOBAL_SORT (Optional)<br></br>
 > `Config Param: BULK_INSERT_SORT_MODE`<br></br>
 
@@ -2161,6 +2217,13 @@ By default false (the names of partition folders are 
only partition values)<br><
 
 ---
 
+> #### 
hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled
+> When set to true, consistent value will be generated for a logical timestamp 
type column, like timestamp-millis and timestamp-micros, irrespective of 
whether row-writer is enabled. Disabled by default so as not to break the 
pipeline that deploy either fully row-writer path or non row-writer path. For 
example, if it is kept disabled then record key of timestamp type with value 
`2016-12-29 09:54:00` will be written as timestamp `2016-12-29 09:54:00.0` in 
row-writer path, while it will be [...]
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED`<br></br>
+
+---
+
 > #### hoodie.datasource.write.partitionpath.field
 > Partition path field. Value to be used at the partitionPath component of 
 > HoodieKey. Actual value ontained by invoking .toString()<br></br>
 > **Default Value**: N/A (Required)<br></br>
@@ -2332,99 +2395,6 @@ Configurations that control indexing behavior (when 
HBase based indexing is enab
 
 ---
 
-### Write commit pulsar callback configs 
{#Write-commit-pulsar-callback-configs}
-
-Controls notifications sent to pulsar, on events happening to a hudi table.
-
-`Config Class`: 
org.apache.hudi.utilities.callback.pulsar.HoodieWriteCommitPulsarCallbackConfig<br></br>
-> #### hoodie.write.commit.callback.pulsar.operation-timeout
-> Duration of waiting for completing an operation.<br></br>
-> **Default Value**: 30s (Optional)<br></br>
-> `Config Param: OPERATION_TIMEOUT`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.topic
-> pulsar topic name to publish timeline activity into.<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: TOPIC`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.producer.block-if-queue-full
-> When the queue is full, the method is blocked instead of an exception is 
thrown.<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: PRODUCER_BLOCK_QUEUE_FULL`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.producer.send-timeout
-> The timeout in each sending to pulsar.<br></br>
-> **Default Value**: 30s (Optional)<br></br>
-> `Config Param: PRODUCER_SEND_TIMEOUT`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.broker.service.url
-> Server's url of pulsar cluster, to be used for publishing commit 
metadata.<br></br>
-> **Default Value**: N/A (Required)<br></br>
-> `Config Param: BROKER_SERVICE_URL`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.keepalive-interval
-> Duration of keeping alive interval for each client broker 
connection.<br></br>
-> **Default Value**: 30s (Optional)<br></br>
-> `Config Param: KEEPALIVE_INTERVAL`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.producer.pending-total-size
-> The maximum number of pending messages across partitions.<br></br>
-> **Default Value**: 50000 (Optional)<br></br>
-> `Config Param: PRODUCER_PENDING_SIZE`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.request-timeout
-> Duration of waiting for completing a request.<br></br>
-> **Default Value**: 60s (Optional)<br></br>
-> `Config Param: REQUEST_TIMEOUT`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.producer.pending-queue-size
-> The maximum size of a queue holding pending messages.<br></br>
-> **Default Value**: 1000 (Optional)<br></br>
-> `Config Param: PRODUCER_PENDING_QUEUE_SIZE`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.producer.route-mode
-> Message routing logic for producers on partitioned topics.<br></br>
-> **Default Value**: RoundRobinPartition (Optional)<br></br>
-> `Config Param: PRODUCER_ROUTE_MODE`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
-> #### hoodie.write.commit.callback.pulsar.connection-timeout
-> Duration of waiting for a connection to a broker to be established.<br></br>
-> **Default Value**: 10s (Optional)<br></br>
-> `Config Param: CONNECTION_TIMEOUT`<br></br>
-> `Since Version: 0.11.0`<br></br>
-
----
-
 ### Write commit Kafka callback configs {#Write-commit-Kafka-callback-configs}
 
 Controls notifications sent to Kafka, on events happening to a hudi table.
@@ -2501,7 +2471,7 @@ Configs that control locking mechanisms required for 
concurrency control  betwee
 
 > #### hoodie.write.lock.wait_time_ms_between_retry
 > Initial amount of time to wait between retries to acquire locks,  subsequent 
 > retries will exponentially backoff.<br></br>
-> **Default Value**: 5000 (Optional)<br></br>
+> **Default Value**: 1000 (Optional)<br></br>
 > `Config Param: LOCK_ACQUIRE_RETRY_WAIT_TIME_IN_MILLIS`<br></br>
 > `Since Version: 0.8.0`<br></br>
 
@@ -2509,7 +2479,7 @@ Configs that control locking mechanisms required for 
concurrency control  betwee
 
 > #### hoodie.write.lock.num_retries
 > Maximum number of times to retry lock acquire, at each lock provider<br></br>
-> **Default Value**: 3 (Optional)<br></br>
+> **Default Value**: 15 (Optional)<br></br>
 > `Config Param: LOCK_ACQUIRE_NUM_RETRIES`<br></br>
 > `Since Version: 0.8.0`<br></br>
 
@@ -2659,6 +2629,13 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 
 ---
 
+> #### hoodie.archive.merge.enable
+> When enable, hoodie will auto merge several small archive files into larger 
one. It's useful when storage scheme doesn't support append operation.<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_ENABLE`<br></br>
+
+---
+
 > #### hoodie.cleaner.commits.retained
 > Number of commits to retain, without cleaning. This will be retained for 
 > num_of_commits * time_between_commits (scheduled). This also directly 
 > translates into how much data retention the table supports for incremental 
 > queries.<br></br>
 > **Default Value**: 10 (Optional)<br></br>
@@ -2708,6 +2685,13 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 
 ---
 
+> #### hoodie.archive.merge.small.file.limit.bytes
+> This config sets the archive file size limit below which an archive file 
becomes a candidate to be selected as such a small file.<br></br>
+> **Default Value**: 20971520 (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_SMALL_FILE_LIMIT_BYTES`<br></br>
+
+---
+
 > #### hoodie.cleaner.fileversions.retained
 > When KEEP_LATEST_FILE_VERSIONS cleaning policy is used,  the minimum number 
 > of file slices to retain in each file group, during cleaning.<br></br>
 > **Default Value**: 3 (Optional)<br></br>
@@ -2729,6 +2713,13 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 
 ---
 
+> #### hoodie.archive.merge.files.batch.size
+> The number of small archive files to be merged at once.<br></br>
+> **Default Value**: 10 (Optional)<br></br>
+> `Config Param: ARCHIVE_MERGE_FILES_BATCH_SIZE`<br></br>
+
+---
+
 > #### hoodie.parquet.small.file.limit
 > During upsert operation, we opportunistically expand existing small files on 
 > storage, instead of writing new files, to keep number of files to an 
 > optimum. This config sets the file size limit below which a file on storage  
 > becomes a candidate to be selected as such a `small file`. By default, treat 
 > any file <= 100MB as a small file.<br></br>
 > **Default Value**: 104857600 (Optional)<br></br>
@@ -2757,6 +2748,14 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 
 ---
 
+> #### hoodie.compaction.preserve.commit.metadata
+> When rewriting data, preserves existing hoodie_commit_time<br></br>
+> **Default Value**: false (Optional)<br></br>
+> `Config Param: PRESERVE_COMMIT_METADATA`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
 > #### hoodie.copyonwrite.insert.auto.split
 > Config to control whether we control insert split sizes automatically based 
 > on average record sizes. It's recommended to keep this turned on, since hand 
 > tuning is otherwise extremely cumbersome.<br></br>
 > **Default Value**: true (Optional)<br></br>
@@ -2984,6 +2983,20 @@ Configurations that control indexing behavior, which 
tags incoming records as ei
 
 ---
 
+> #### hoodie.bucket.index.num.buckets
+> Only applies if index type is BUCKET_INDEX. Determine the number of buckets 
in the hudi table, and each partition is divided to N buckets.<br></br>
+> **Default Value**: 256 (Optional)<br></br>
+> `Config Param: BUCKET_INDEX_NUM_BUCKETS`<br></br>
+
+---
+
+> #### hoodie.bucket.index.hash.field
+> Index key. It is used to index the record and find its file group. If not 
set, use record key field as default<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: BUCKET_INDEX_HASH_FIELD`<br></br>
+
+---
+
 > #### hoodie.bloom.index.bucketized.checking
 > Only applies if index type is BLOOM. When true, bucketized bloom filtering 
 > is enabled. This reduces skew seen in sort based bloom index lookup<br></br>
 > **Default Value**: true (Optional)<br></br>
@@ -2992,7 +3005,7 @@ Configurations that control indexing behavior, which tags 
incoming records as ei
 ---
 
 > #### hoodie.index.type
-> Type of index to use. Default is Bloom filter. Possible options are [BLOOM | 
GLOBAL_BLOOM |SIMPLE | GLOBAL_SIMPLE | INMEMORY | HBASE]. Bloom filters removes 
the dependency on a external system and is stored in the footer of the Parquet 
Data Files<br></br>
+> Type of index to use. Default is Bloom filter. Possible options are [BLOOM | 
GLOBAL_BLOOM |SIMPLE | GLOBAL_SIMPLE | INMEMORY | HBASE | BUCKET]. Bloom 
filters removes the dependency on a external system and is stored in the footer 
of the Parquet Data Files<br></br>
 > **Default Value**: N/A (Required)<br></br>
 > `Config Param: INDEX_TYPE`<br></br>
 
@@ -3080,27 +3093,11 @@ Configurations that control indexing behavior, which 
tags incoming records as ei
 Configurations that control the clustering table service in hudi, which 
optimizes the storage layout for better query performance by sorting and sizing 
data files.
 
 `Config Class`: org.apache.hudi.config.HoodieClusteringConfig<br></br>
-> #### hoodie.clustering.preserve.commit.metadata
-> When rewriting data, preserves existing hoodie_commit_time<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: PRESERVE_COMMIT_METADATA`<br></br>
-> `Since Version: 0.9.0`<br></br>
-
----
-
-> #### hoodie.clustering.plan.strategy.max.num.groups
-> Maximum number of groups to create as part of ClusteringPlan. Increasing 
groups will increase parallelism<br></br>
-> **Default Value**: 30 (Optional)<br></br>
-> `Config Param: PLAN_STRATEGY_MAX_GROUPS`<br></br>
-> `Since Version: 0.7.0`<br></br>
-
----
-
-> #### hoodie.layout.optimize.curve.build.method
-> Controls how data is sampled to build the space filling curves. two methods: 
`direct`,`sample`.The direct method is faster than the sampling, however sample 
method would produce a better data layout.<br></br>
-> **Default Value**: direct (Optional)<br></br>
-> `Config Param: LAYOUT_OPTIMIZE_CURVE_BUILD_METHOD`<br></br>
-> `Since Version: 0.10.0`<br></br>
+> #### hoodie.clustering.plan.strategy.cluster.end.partition
+> End partition used to filter partition (inclusive), only effective when the 
filter mode 'hoodie.clustering.plan.partition.filter.mode' is 
SELECTED_PARTITIONS<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION_FILTER_END_PARTITION`<br></br>
+> `Since Version: 0.11.0`<br></br>
 
 ---
 
@@ -3120,14 +3117,6 @@ Configurations that control the clustering table service 
in hudi, which optimize
 
 ---
 
-> #### hoodie.layout.optimize.data.skipping.enable
-> Enable data skipping by collecting statistics once layout optimization is 
complete.<br></br>
-> **Default Value**: true (Optional)<br></br>
-> `Config Param: LAYOUT_OPTIMIZE_DATA_SKIPPING_ENABLE`<br></br>
-> `Since Version: 0.10.0`<br></br>
-
----
-
 > #### hoodie.clustering.inline.max.commits
 > Config to control frequency of clustering planning<br></br>
 > **Default Value**: 4 (Optional)<br></br>
@@ -3137,10 +3126,11 @@ Configurations that control the clustering table 
service in hudi, which optimize
 ---
 
 > #### hoodie.layout.optimize.enable
-> Enable use z-ordering/space-filling curves to optimize the layout of table 
to boost query performance. This parameter takes precedence over clustering 
strategy set using hoodie.clustering.execution.strategy.class<br></br>
+> This setting has no effect. Please refer to clustering configuration, as 
well as LAYOUT_OPTIMIZE_STRATEGY config to enable advanced record layout 
optimization strategies<br></br>
 > **Default Value**: false (Optional)<br></br>
 > `Config Param: LAYOUT_OPTIMIZE_ENABLE`<br></br>
 > `Since Version: 0.10.0`<br></br>
+> `Deprecated Version: 0.11.0`<br></br>
 
 ---
 
@@ -3168,22 +3158,6 @@ Configurations that control the clustering table service 
in hudi, which optimize
 
 ---
 
-> #### hoodie.clustering.plan.strategy.max.bytes.per.group
-> Each clustering operation can create multiple output file groups. Total 
amount of data processed by clustering operation is defined by below two 
properties (CLUSTERING_MAX_BYTES_PER_GROUP * CLUSTERING_MAX_NUM_GROUPS). Max 
amount of data to be included in one group<br></br>
-> **Default Value**: 2147483648 (Optional)<br></br>
-> `Config Param: PLAN_STRATEGY_MAX_BYTES_PER_OUTPUT_FILEGROUP`<br></br>
-> `Since Version: 0.7.0`<br></br>
-
----
-
-> #### hoodie.clustering.plan.strategy.small.file.limit
-> Files smaller than the size specified here are candidates for 
clustering<br></br>
-> **Default Value**: 629145600 (Optional)<br></br>
-> `Config Param: PLAN_STRATEGY_SMALL_FILE_LIMIT`<br></br>
-> `Since Version: 0.7.0`<br></br>
-
----
-
 > #### hoodie.clustering.async.enabled
 > Enable running of clustering service, asynchronously as inserts happen on 
 > the table.<br></br>
 > **Default Value**: false (Optional)<br></br>
@@ -3201,7 +3175,7 @@ Configurations that control the clustering table service 
in hudi, which optimize
 ---
 
 > #### hoodie.layout.optimize.build.curve.sample.size
-> when settinghoodie.layout.optimize.curve.build.method to `sample`, the 
amount of sampling to be done.Large sample size leads to better results, at the 
expense of more memory usage.<br></br>
+> Determines target sample size used by the Boundary-based Interleaved Index 
method of building space-filling curve. Larger sample size entails better 
layout optimization outcomes, at the expense of higher memory 
footprint.<br></br>
 > **Default Value**: 200000 (Optional)<br></br>
 > `Config Param: LAYOUT_OPTIMIZE_BUILD_CURVE_SAMPLE_SIZE`<br></br>
 > `Since Version: 0.10.0`<br></br>
@@ -3217,8 +3191,8 @@ Configurations that control the clustering table service 
in hudi, which optimize
 ---
 
 > #### hoodie.layout.optimize.strategy
-> Type of layout optimization to be applied, current only supports `z-order` 
and `hilbert` curves.<br></br>
-> **Default Value**: z-order (Optional)<br></br>
+> Determines ordering strategy used in records layout optimization. Currently 
supported strategies are "linear", "z-order" and "hilbert" values are 
supported.<br></br>
+> **Default Value**: linear (Optional)<br></br>
 > `Config Param: LAYOUT_OPTIMIZE_STRATEGY`<br></br>
 > `Since Version: 0.10.0`<br></br>
 
@@ -3232,6 +3206,14 @@ Configurations that control the clustering table service 
in hudi, which optimize
 
 ---
 
+> #### hoodie.clustering.plan.strategy.cluster.begin.partition
+> Begin partition used to filter partition (inclusive), only effective when 
the filter mode 'hoodie.clustering.plan.partition.filter.mode' is 
SELECTED_PARTITIONS<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION_FILTER_BEGIN_PARTITION`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
 > #### hoodie.clustering.plan.strategy.sort.columns
 > Columns to sort the data by when clustering<br></br>
 > **Default Value**: N/A (Required)<br></br>
@@ -3240,6 +3222,71 @@ Configurations that control the clustering table service 
in hudi, which optimize
 
 ---
 
+> #### hoodie.clustering.preserve.commit.metadata
+> When rewriting data, preserves existing hoodie_commit_time<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: PRESERVE_COMMIT_METADATA`<br></br>
+> `Since Version: 0.9.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.max.num.groups
+> Maximum number of groups to create as part of ClusteringPlan. Increasing 
groups will increase parallelism<br></br>
+> **Default Value**: 30 (Optional)<br></br>
+> `Config Param: PLAN_STRATEGY_MAX_GROUPS`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.partition.filter.mode
+> Partition filter mode used in the creation of clustering plan. Available 
values are - NONE: do not filter table partition and thus the clustering plan 
will include all partitions that have clustering candidate.RECENT_DAYS: keep a 
continuous range of partitions, worked together with configs 
'hoodie.clustering.plan.strategy.daybased.lookback.partitions' and 
'hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions.SELECTED_PARTITIONS:
 keep partitions that are in the specified r [...]
+> **Default Value**: NONE (Optional)<br></br>
+> `Config Param: PLAN_PARTITION_FILTER_MODE_NAME`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.layout.optimize.data.skipping.enable
+> Enable data skipping by collecting statistics once layout optimization is 
complete.<br></br>
+> **Default Value**: true (Optional)<br></br>
+> `Config Param: LAYOUT_OPTIMIZE_DATA_SKIPPING_ENABLE`<br></br>
+> `Since Version: 0.10.0`<br></br>
+> `Deprecated Version: 0.11.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.max.bytes.per.group
+> Each clustering operation can create multiple output file groups. Total 
amount of data processed by clustering operation is defined by below two 
properties (CLUSTERING_MAX_BYTES_PER_GROUP * CLUSTERING_MAX_NUM_GROUPS). Max 
amount of data to be included in one group<br></br>
+> **Default Value**: 2147483648 (Optional)<br></br>
+> `Config Param: PLAN_STRATEGY_MAX_BYTES_PER_OUTPUT_FILEGROUP`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.small.file.limit
+> Files smaller than the size specified here are candidates for 
clustering<br></br>
+> **Default Value**: 629145600 (Optional)<br></br>
+> `Config Param: PLAN_STRATEGY_SMALL_FILE_LIMIT`<br></br>
+> `Since Version: 0.7.0`<br></br>
+
+---
+
+> #### hoodie.layout.optimize.curve.build.method
+> Controls how data is sampled to build the space-filling curves. Two methods: 
"direct", "sample". The direct method is faster than the sampling, however 
sample method would produce a better data layout.<br></br>
+> **Default Value**: direct (Optional)<br></br>
+> `Config Param: LAYOUT_OPTIMIZE_SPATIAL_CURVE_BUILD_METHOD`<br></br>
+> `Since Version: 0.10.0`<br></br>
+
+---
+
+> #### hoodie.clustering.plan.strategy.partition.regex.pattern
+> Filter clustering partitions that matched regex pattern<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: PARTITION_REGEX_PATTERN`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
 > #### hoodie.clustering.plan.strategy.daybased.lookback.partitions
 > Number of partitions to list to create ClusteringPlan<br></br>
 > **Default Value**: 2 (Optional)<br></br>
@@ -3445,6 +3492,14 @@ Enables reporting on Hudi metrics. Hudi publishes 
metrics on every commit, clean
 
 ---
 
+> #### hoodie.metrics.reporter.metricsname.prefix
+> The prefix given to the metrics names.<br></br>
+> **Default Value**:  (Optional)<br></br>
+> `Config Param: METRICS_REPORTER_PREFIX`<br></br>
+> `Since Version: 0.11.0`<br></br>
+
+---
+
 > #### hoodie.metrics.reporter.type
 > Type of metrics reporter.<br></br>
 > **Default Value**: GRAPHITE (Optional)<br></br>
@@ -3676,6 +3731,13 @@ Configurations for Kafka Connect Sink Connector for Hudi.
 
 ---
 
+> #### hadoop.home
+> The Hadoop home directory.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HADOOP_HOME`<br></br>
+
+---
+
 > #### hoodie.meta.sync.enable
 > Enable Meta Sync such as Hive<br></br>
 > **Default Value**: false (Optional)<br></br>
@@ -3711,6 +3773,13 @@ Configurations for Kafka Connect Sink Connector for Hudi.
 
 ---
 
+> #### hadoop.conf.dir
+> The Hadoop configuration directory.<br></br>
+> **Default Value**: N/A (Required)<br></br>
+> `Config Param: HADOOP_CONF_DIR`<br></br>
+
+---
+
 > #### hoodie.kafka.compaction.async.enable
 > Controls whether async compaction should be turned on for MOR table 
 > writing.<br></br>
 > **Default Value**: true (Optional)<br></br>

[hudi] branch asf-site updated: [HUDI-1331] Updating configs from 0.10.1 (#4702)

Reply via email to