umehrot2 commented on a change in pull request #4087:
URL: https://github.com/apache/hudi/pull/4087#discussion_r755589562
##########
File path: website/docs/concurrency_control.md
##########
@@ -69,6 +69,17 @@ hoodie.write.lock.hivemetastore.table
`The HiveMetastore URI's are picked up from the hadoop configuration file
loaded during runtime.`
+**`AWS DynamoDB`** based lock provider
Review comment:
`AWS DynamoDB` => `Amazon DynamoDB`
##########
File path: website/docs/configurations.md
##########
@@ -15,6 +15,20 @@ This page covers the different ways of configuring your job
to write/read Hudi t
- [**Metrics Configs**](#METRICS): These set of configs are used to enable
monitoring and reporting of keyHudi stats and metrics.
- [**Record Payload Config**](#RECORD_PAYLOAD): This is the lowest level of
customization offered by Hudi. Record payloads define how to produce new values
to upsert based on incoming new record and stored old record. Hudi provides
default implementations such as OverwriteWithLatestAvroPayload which simply
update table with the latest/last-written record. This can be overridden to a
custom class extending HoodieRecordPayload class, on both datasource and
WriteClient levels.
+---
Review comment:
Why not add this as a separate bullet point ? We can call it
`Environment Configs` ?
##########
File path: website/docs/concurrency_control.md
##########
@@ -69,6 +69,17 @@ hoodie.write.lock.hivemetastore.table
`The HiveMetastore URI's are picked up from the hadoop configuration file
loaded during runtime.`
+**`AWS DynamoDB`** based lock provider
+
+AWS DynamoDB based lock provides a simple way to support multi writing across
different clusters
+
+```
+hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
+hoodie.write.lock.dynamodb.table
+hoodie.write.lock.dynamodb.partition_key
+hoodie.write.lock.dynamodb.region
+```
Review comment:
In addition to these, we should mention how the AWS Credentials can be
configured to talk to DynamoDB. Mention the specific configurations here, and
that if not configured it falls back to
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html.
##########
File path: website/docs/configurations.md
##########
@@ -15,6 +15,20 @@ This page covers the different ways of configuring your job
to write/read Hudi t
- [**Metrics Configs**](#METRICS): These set of configs are used to enable
monitoring and reporting of keyHudi stats and metrics.
- [**Record Payload Config**](#RECORD_PAYLOAD): This is the lowest level of
customization offered by Hudi. Record payloads define how to produce new values
to upsert based on incoming new record and stored old record. Hudi provides
default implementations such as OverwriteWithLatestAvroPayload which simply
update table with the latest/last-written record. This can be overridden to a
custom class extending HoodieRecordPayload class, on both datasource and
WriteClient levels.
+---
+Except directly passing configurations to Hudi jobs, since 0.10.0, Hudi also
supports passing configurations through an external configuration file
`hudi-default.conf` in which each line consists of a key and a value separated
by whitespace/equal sign. For example:
+```
+hoodie.datasource.hive_sync.mode jdbc
+hoodie.datasource.hive_sync.jdbcurl jdbc:hive2://localhost:10000
+hoodie.datasource.hive_sync.support_timestamp false
+```
+This is a cluster level configuration, all the Hudi jobs running in this
cluster would share the same configuration.
+The configuration is parsed and evaluated when the Hudi engine processes are
started. Changes to the configuration file require restarting the relevant
processes.
+
Review comment:
This statement is not true. As discussed it is not the engines like
Spark, Hive that load the config. Instead we are relying on Hudi code paths to
ultimately load this.
May be you should mention that this also kicks in via Spark SQL DML and that
it helps reduces the configs that once would otherwise have to keep passing.
##########
File path: website/docs/configurations.md
##########
@@ -15,6 +15,20 @@ This page covers the different ways of configuring your job
to write/read Hudi t
- [**Metrics Configs**](#METRICS): These set of configs are used to enable
monitoring and reporting of keyHudi stats and metrics.
- [**Record Payload Config**](#RECORD_PAYLOAD): This is the lowest level of
customization offered by Hudi. Record payloads define how to produce new values
to upsert based on incoming new record and stored old record. Hudi provides
default implementations such as OverwriteWithLatestAvroPayload which simply
update table with the latest/last-written record. This can be overridden to a
custom class extending HoodieRecordPayload class, on both datasource and
WriteClient levels.
+---
+Except directly passing configurations to Hudi jobs, since 0.10.0, Hudi also
supports passing configurations through an external configuration file
`hudi-default.conf` in which each line consists of a key and a value separated
by whitespace/equal sign. For example:
Review comment:
`Except` => `Instead of`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]