[hudi] branch asf-site updated: [HUDI-6520] [DOCS] Rename Deltastreamer and related classes and configs (#9179)

yihua Tue, 18 Jul 2023 08:50:30 -0700

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 6d5a4e2a6a7 [HUDI-6520] [DOCS] Rename Deltastreamer and related 
classes and configs (#9179)
6d5a4e2a6a7 is described below

commit 6d5a4e2a6a71f9f89f901169107e23665d034440
Author: Amrish Lal <[email protected]>
AuthorDate: Tue Jul 18 08:48:22 2023 -0700

    [HUDI-6520] [DOCS] Rename Deltastreamer and related classes and configs 
(#9179)
    
    Co-authored-by: Y Ethan Guo <[email protected]>
---
 website/docs/clustering.md                    |  10 +-
 website/docs/compaction.md                    |   8 +-
 website/docs/concurrency_control.md           |  20 ++--
 website/docs/deployment.md                    |  18 +--
 website/docs/docker_demo.md                   |  26 ++--
 website/docs/faq.md                           |  24 ++--
 website/docs/gcp_bigquery.md                  |  10 +-
 website/docs/hoodie_deltastreamer.md          | 163 ++++++++++++++------------
 website/docs/key_generation.md                |  76 ++++++------
 website/docs/metadata_indexing.md             |  14 +--
 website/docs/metrics.md                       |   4 +-
 website/docs/migration_guide.md               |   6 +-
 website/docs/precommit_validator.md           |   2 +-
 website/docs/querying_data.md                 |   2 +-
 website/docs/quick-start-guide.md             |   2 +-
 website/docs/s3_hoodie.md                     |   2 +-
 website/docs/syncing_aws_glue_data_catalog.md |   2 +-
 website/docs/syncing_datahub.md               |  10 +-
 website/docs/syncing_metastore.md             |   2 +-
 website/docs/transforms.md                    |   6 +-
 website/docs/use_cases.md                     |   2 +-
 website/docs/write_operations.md              |   2 +-
 website/docs/writing_data.md                  |   2 +-
 23 files changed, 213 insertions(+), 200 deletions(-)

diff --git a/website/docs/clustering.md b/website/docs/clustering.md
index d2ceb196d02..8eb0dfbfaa1 100644
--- a/website/docs/clustering.md
+++ b/website/docs/clustering.md
@@ -283,17 +283,17 @@ 
hoodie.clustering.execution.strategy.class=org.apache.hudi.client.clustering.run
 hoodie.clustering.plan.strategy.sort.columns=column1,column2
 ```
 
-### HoodieDeltaStreamer
+### HoodieStreamer
 
-This brings us to our users' favorite utility in Hudi. Now, we can trigger 
asynchronous clustering with DeltaStreamer.
+This brings us to our users' favorite utility in Hudi. Now, we can trigger 
asynchronous clustering with Hudi Streamer.
 Just set the `hoodie.clustering.async.enabled` config to true and specify 
other clustering config in properties file
-whose location can be pased as `—props` when starting the deltastreamer (just 
like in the case of HoodieClusteringJob).
+whose location can be pased as `—props` when starting the Hudi Streamer (just 
like in the case of HoodieClusteringJob).
 
-A sample spark-submit command to setup HoodieDeltaStreamer is as below:
+A sample spark-submit command to setup HoodieStreamer is as below:
 
 ```bash
 spark-submit \
---class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
+--class org.apache.hudi.utilities.streamer.HoodieStreamer \
 
/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.9.0-SNAPSHOT.jar
 \
 --props /path/to/config/clustering_kafka.properties \
 --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider 
\
diff --git a/website/docs/compaction.md b/website/docs/compaction.md
index a6249b7ae7c..9f7b119db43 100644
--- a/website/docs/compaction.md
+++ b/website/docs/compaction.md
@@ -45,14 +45,14 @@ import org.apache.spark.sql.streaming.ProcessingTime;
  writer.trigger(new ProcessingTime(30000)).start(tablePath);
 ```
 
-### DeltaStreamer Continuous Mode
-Hudi DeltaStreamer provides continuous ingestion mode where a single long 
running spark application  
+### Hudi Streamer Continuous Mode
+Hudi Streamer provides continuous ingestion mode where a single long running 
spark application  
 ingests data to Hudi table continuously from upstream sources. In this mode, 
Hudi supports managing asynchronous
 compactions. Here is an example snippet for running in continuous mode with 
async compactions
 
 ```properties
 spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.6.0 \
---class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
+--class org.apache.hudi.utilities.streamer.HoodieStreamer \
 --table-type MERGE_ON_READ \
 --target-base-path <hudi_base_path> \
 --target-table <hudi_table> \
@@ -76,7 +76,7 @@ you may want Synchronous compaction, which means that as a 
commit is written it
 
 Compaction is run synchronously by passing the flag "--disable-compaction" 
(Meaning to disable async compaction scheduling).
 When both ingestion and compaction is running in the same spark context, you 
can use resource allocation configuration 
-in DeltaStreamer CLI such as ("--delta-sync-scheduling-weight",
+in Hudi Streamer CLI such as ("--delta-sync-scheduling-weight",
 "--compact-scheduling-weight", ""--delta-sync-scheduling-minshare", and 
"--compact-scheduling-minshare")
 to control executor allocation between ingestion and compaction.
 
diff --git a/website/docs/concurrency_control.md 
b/website/docs/concurrency_control.md
index efa7f3212e2..7a014d16140 100644
--- a/website/docs/concurrency_control.md
+++ b/website/docs/concurrency_control.md
@@ -5,7 +5,7 @@ toc: true
 last_modified_at: 2021-03-19T15:59:57-04:00
 ---
 
-In this section, we will cover Hudi's concurrency model and describe ways to 
ingest data into a Hudi Table from multiple writers; using the 
[DeltaStreamer](#deltastreamer) tool as well as 
+In this section, we will cover Hudi's concurrency model and describe ways to 
ingest data into a Hudi Table from multiple writers; using the [Hudi 
Streamer](#hudi-streamer) tool as well as 
 using the [Hudi datasource](#datasource-writer).
 
 ## Supported Concurrency Controls
@@ -19,7 +19,7 @@ between multiple table service writers and readers. 
Additionally, using MVCC, Hu
 the same Hudi Table. Hudi supports `file level OCC`, i.e., for any 2 commits 
(or writers) happening to the same table, if they do not have writes to 
overlapping files being changed, both writers are allowed to succeed. 
   This feature is currently *experimental* and requires either Zookeeper or 
HiveMetastore to acquire locks.
 
-It may be helpful to understand the different guarantees provided by [write 
operations](/docs/write_operations/) via Hudi datasource or the delta streamer.
+It may be helpful to understand the different guarantees provided by [write 
operations](/docs/write_operations/) via Hudi datasource or the Hudi Streamer.
 
 ## Single Writer Guarantees
 
@@ -171,21 +171,21 @@ inputDF.write.format("hudi")
        .save(basePath)
 ```
 
-## DeltaStreamer
+## Hudi Streamer
 
-The `HoodieDeltaStreamer` utility (part of hudi-utilities-bundle) provides 
ways to ingest from different sources such as DFS or Kafka, with the following 
capabilities.
+The `HoodieStreamer` utility (part of hudi-utilities-bundle) provides ways to 
ingest from different sources such as DFS or Kafka, with the following 
capabilities.
 
-Using optimistic_concurrency_control via delta streamer requires adding the 
above configs to the properties file that can be passed to the
-job. For example below, adding the configs to kafka-source.properties file and 
passing them to deltastreamer will enable optimistic concurrency.
-A deltastreamer job can then be triggered as follows:
+Using optimistic_concurrency_control via Hudi Streamer requires adding the 
above configs to the properties file that can be passed to the
+job. For example below, adding the configs to kafka-source.properties file and 
passing them to Hudi Streamer will enable optimistic concurrency.
+A Hudi Streamer job can then be triggered as follows:
 
 ```java
-[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
-  --props 
file://${PWD}/hudi-utilities/src/test/resources/delta-streamer-config/kafka-source.properties
 \
+[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.streamer.HoodieStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
+  --props 
file://${PWD}/hudi-utilities/src/test/resources/streamer-config/kafka-source.properties
 \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
   --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
   --source-ordering-field impresssiontime \
-  --target-base-path file:\/\/\/tmp/hudi-deltastreamer-op \ 
+  --target-base-path file:\/\/\/tmp/hudi-streamer-op \ 
   --target-table uber.impressions \
   --op BULK_INSERT
 ```
diff --git a/website/docs/deployment.md b/website/docs/deployment.md
index 6d26df1a5d5..088f99834ad 100644
--- a/website/docs/deployment.md
+++ b/website/docs/deployment.md
@@ -16,19 +16,19 @@ Specifically, we will cover the following aspects.
 ## Deploying
 
 All in all, Hudi deploys with no long running servers or additional 
infrastructure cost to your data lake. In fact, Hudi pioneered this model of 
building a transactional distributed storage layer
-using existing infrastructure and its heartening to see other systems adopting 
similar approaches as well. Hudi writing is done via Spark jobs (DeltaStreamer 
or custom Spark datasource jobs), deployed per standard Apache Spark 
[recommendations](https://spark.apache.org/docs/latest/cluster-overview).
+using existing infrastructure and its heartening to see other systems adopting 
similar approaches as well. Hudi writing is done via Spark jobs (Hudi Streamer 
or custom Spark datasource jobs), deployed per standard Apache Spark 
[recommendations](https://spark.apache.org/docs/latest/cluster-overview).
 Querying Hudi tables happens via libraries installed into Apache Hive, Apache 
Spark or PrestoDB and hence no additional infrastructure is necessary. 
 
 A typical Hudi data ingestion can be achieved in 2 modes. In a single run 
mode, Hudi ingestion reads next batch of data, ingest them to Hudi table and 
exits. In continuous mode, Hudi ingestion runs as a long-running service 
executing ingestion in a loop.
 
 With Merge_On_Read Table, Hudi ingestion needs to also take care of compacting 
delta files. Again, compaction can be performed in an asynchronous-mode by 
letting compaction run concurrently with ingestion or in a serial fashion with 
one after another.
 
-### DeltaStreamer
+### Hudi Streamer
 
-[DeltaStreamer](/docs/hoodie_deltastreamer#deltastreamer) is the standalone 
utility to incrementally pull upstream changes 
+[Hudi Streamer](/docs/hoodie_deltastreamer#hudi-streamer) is the standalone 
utility to incrementally pull upstream changes 
 from varied sources such as DFS, Kafka and DB Changelogs and ingest them to 
hudi tables.  It runs as a spark application in two modes.
 
-To use DeltaStreamer in Spark, the `hudi-utilities-bundle` is required, by 
adding
+To use Hudi Streamer in Spark, the `hudi-utilities-bundle` is required, by 
adding
 `--packages org.apache.hudi:hudi-utilities-bundle_2.11:0.13.0` to the 
`spark-submit` command. From 0.11.0 release, we start
 to provide a new `hudi-utilities-slim-bundle` which aims to exclude 
dependencies that can cause conflicts and compatibility
 issues with different versions of Spark.  The `hudi-utilities-slim-bundle` 
should be used along with a Hudi Spark bundle 
@@ -36,7 +36,7 @@ corresponding to the Spark version used, e.g.,
 `--packages 
org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.13.0,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.13.0`,
 if using `hudi-utilities-bundle` solely in Spark encounters compatibility 
issues.
 
- - **Run Once Mode** : In this mode, Deltastreamer performs one ingestion 
round which includes incrementally pulling events from upstream sources and 
ingesting them to hudi table. Background operations like cleaning old file 
versions and archiving hoodie timeline are automatically executed as part of 
the run. For Merge-On-Read tables, Compaction is also run inline as part of 
ingestion unless disabled by passing the flag "--disable-compaction". By 
default, Compaction is run inline for eve [...]
+ - **Run Once Mode** : In this mode, Hudi Streamer performs one ingestion 
round which includes incrementally pulling events from upstream sources and 
ingesting them to hudi table. Background operations like cleaning old file 
versions and archiving hoodie timeline are automatically executed as part of 
the run. For Merge-On-Read tables, Compaction is also run inline as part of 
ingestion unless disabled by passing the flag "--disable-compaction". By 
default, Compaction is run inline for eve [...]
 
 Here is an example invocation for reading from kafka topic in a single-run 
mode and writing to Merge On Read table type in a yarn cluster.
 
@@ -74,7 +74,7 @@ Here is an example invocation for reading from kafka topic in 
a single-run mode
  --conf spark.sql.catalogImplementation=hive \
  --conf spark.sql.shuffle.partitions=100 \
  --driver-class-path $HADOOP_CONF_DIR \
- --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
+ --class org.apache.hudi.utilities.streamer.HoodieStreamer \
  --table-type MERGE_ON_READ \
  --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
  --source-ordering-field ts  \
@@ -84,7 +84,7 @@ Here is an example invocation for reading from kafka topic in 
a single-run mode
  --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
 ```
 
- - **Continuous Mode** :  Here, deltastreamer runs an infinite loop with each 
round performing one ingestion round as described in **Run Once Mode**. The 
frequency of data ingestion can be controlled by the configuration 
"--min-sync-interval-seconds". For Merge-On-Read tables, Compaction is run in 
asynchronous fashion concurrently with ingestion unless disabled by passing the 
flag "--disable-compaction". Every ingestion run triggers a compaction request 
asynchronously and this frequency  [...]
+ - **Continuous Mode** :  Here, Hudi Streamer runs an infinite loop with each 
round performing one ingestion round as described in **Run Once Mode**. The 
frequency of data ingestion can be controlled by the configuration 
"--min-sync-interval-seconds". For Merge-On-Read tables, Compaction is run in 
asynchronous fashion concurrently with ingestion unless disabled by passing the 
flag "--disable-compaction". Every ingestion run triggers a compaction request 
asynchronously and this frequency  [...]
 
 Here is an example invocation for reading from kafka topic in a continuous 
mode and writing to Merge On Read table type in a yarn cluster.
 
@@ -122,7 +122,7 @@ Here is an example invocation for reading from kafka topic 
in a continuous mode
  --conf spark.sql.catalogImplementation=hive \
  --conf spark.sql.shuffle.partitions=100 \
  --driver-class-path $HADOOP_CONF_DIR \
- --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
+ --class org.apache.hudi.utilities.streamer.HoodieStreamer \
  --table-type MERGE_ON_READ \
  --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
  --source-ordering-field ts  \
@@ -161,7 +161,7 @@ As general guidelines,
  - We strive to keep all changes backwards compatible (i.e new code can read 
old data/timeline files) and when we cannot, we will provide upgrade/downgrade 
tools via the CLI
  - We cannot always guarantee forward compatibility (i.e old code being able 
to read data/timeline files written by a greater version). This is generally 
the norm, since no new features can be built otherwise.
    However any large such changes, will be turned off by default, for smooth 
transition to newer release. After a few releases and once enough users deem 
the feature stable in production, we will flip the defaults in a subsequent 
release.
- - Always upgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) 
first and then upgrade the writers (deltastreamer, spark jobs using 
datasource). This often provides the best experience and it's easy to fix 
+ - Always upgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) 
first and then upgrade the writers (Hudi Streamer, spark jobs using 
datasource). This often provides the best experience and it's easy to fix 
    any issues by rolling forward/back the writer code (which typically you 
might have more control over)
  - With large, feature rich releases we recommend migrating slowly, by first 
testing in staging environments and running your own tests. Upgrading Hudi is 
no different than upgrading any database system.
 
diff --git a/website/docs/docker_demo.md b/website/docs/docker_demo.md
index 4d811925de5..0564bce20a7 100644
--- a/website/docs/docker_demo.md
+++ b/website/docs/docker_demo.md
@@ -249,7 +249,7 @@ kcat -b kafkabroker -L -J | jq .
 
 ### Step 2: Incrementally ingest data from Kafka topic
 
-Hudi comes with a tool named DeltaStreamer. This tool can connect to variety 
of data sources (including Kafka) to
+Hudi comes with a tool named Hudi Streamer. This tool can connect to variety 
of data sources (including Kafka) to
 pull changes and apply to Hudi table using upsert/insert primitives. Here, we 
will use the tool to download
 json data from kafka topic and ingest to both COW and MOR tables we 
initialized in the previous step. This tool
 automatically initializes the tables in the file-system if they do not exist 
yet.
@@ -257,9 +257,9 @@ automatically initializes the tables in the file-system if 
they do not exist yet
 ```java
 docker exec -it adhoc-2 /bin/bash
 
-# Run the following spark-submit command to execute the delta-streamer and 
ingest to stock_ticks_cow table in HDFS
+# Run the following spark-submit command to execute the Hudi Streamer and 
ingest to stock_ticks_cow table in HDFS
 spark-submit \
-  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
+  --class org.apache.hudi.utilities.streamer.HoodieStreamer 
$HUDI_UTILITIES_BUNDLE \
   --table-type COPY_ON_WRITE \
   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
   --source-ordering-field ts  \
@@ -267,9 +267,9 @@ spark-submit \
   --target-table stock_ticks_cow --props 
/var/demo/config/kafka-source.properties \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
 
-# Run the following spark-submit command to execute the delta-streamer and 
ingest to stock_ticks_mor table in HDFS
+# Run the following spark-submit command to execute the Hudi Streamer and 
ingest to stock_ticks_mor table in HDFS
 spark-submit \
-  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
+  --class org.apache.hudi.utilities.streamer.HoodieStreamer 
$HUDI_UTILITIES_BUNDLE \
   --table-type MERGE_ON_READ \
   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
   --source-ordering-field ts \
@@ -279,7 +279,7 @@ spark-submit \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
   --disable-compaction
 
-# As part of the setup (Look at setup_demo.sh), the configs needed for 
DeltaStreamer is uploaded to HDFS. The configs
+# As part of the setup (Look at setup_demo.sh), the configs needed for Hudi 
Streamer is uploaded to HDFS. The configs
 # contain mostly Kafa connectivity settings, the avro-schema to be used for 
ingesting along with key and partitioning fields.
 
 exit
@@ -743,9 +743,9 @@ Splits: 17 total, 17 done (100.00%)
 trino:default> exit
 ```
 
-### Step 5: Upload second batch to Kafka and run DeltaStreamer to ingest
+### Step 5: Upload second batch to Kafka and run Hudi Streamer to ingest
 
-Upload the second batch of data and ingest this batch using delta-streamer. As 
this batch does not bring in any new
+Upload the second batch of data and ingest this batch using Hudi Streamer. As 
this batch does not bring in any new
 partitions, there is no need to run hive-sync
 
 ```java
@@ -754,9 +754,9 @@ cat docker/demo/data/batch_2.json | kcat -b kafkabroker -t 
stock_ticks -P
 # Within Docker container, run the ingestion command
 docker exec -it adhoc-2 /bin/bash
 
-# Run the following spark-submit command to execute the delta-streamer and 
ingest to stock_ticks_cow table in HDFS
+# Run the following spark-submit command to execute the Hudi Streamer and 
ingest to stock_ticks_cow table in HDFS
 spark-submit \
-  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
+  --class org.apache.hudi.utilities.streamer.HoodieStreamer 
$HUDI_UTILITIES_BUNDLE \
   --table-type COPY_ON_WRITE \
   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
   --source-ordering-field ts \
@@ -765,9 +765,9 @@ spark-submit \
   --props /var/demo/config/kafka-source.properties \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
 
-# Run the following spark-submit command to execute the delta-streamer and 
ingest to stock_ticks_mor table in HDFS
+# Run the following spark-submit command to execute the Hudi Streamer and 
ingest to stock_ticks_mor table in HDFS
 spark-submit \
-  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
+  --class org.apache.hudi.utilities.streamer.HoodieStreamer 
$HUDI_UTILITIES_BUNDLE \
   --table-type MERGE_ON_READ \
   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
   --source-ordering-field ts \
@@ -780,7 +780,7 @@ spark-submit \
 exit
 ```
 
-With Copy-On-Write table, the second ingestion by DeltaStreamer resulted in a 
new version of Parquet file getting created.
+With Copy-On-Write table, the second ingestion by Hudi Streamer resulted in a 
new version of Parquet file getting created.
 See 
`http://namenode:50070/explorer.html#/user/hive/warehouse/stock_ticks_cow/2018/08/31`
 
 With Merge-On-Read table, the second ingestion merely appended the batch to an 
unmerged delta (log) file.
diff --git a/website/docs/faq.md b/website/docs/faq.md
index 0043aa97a2f..d4801ae10f6 100644
--- a/website/docs/faq.md
+++ b/website/docs/faq.md
@@ -111,7 +111,7 @@ for even more flexibility and get away from Hive-style 
partition evol route.
 
 ### What are some ways to write a Hudi dataset?
 
-Typically, you obtain a set of partial updates/inserts from your source and 
issue [write operations](https://hudi.apache.org/docs/write_operations/) 
against a Hudi dataset.  If you ingesting data from any of the standard sources 
like Kafka, or tailing DFS, the [delta 
streamer](https://hudi.apache.org/docs/hoodie_deltastreamer#deltastreamer) tool 
is invaluable and provides an easy, self-managed solution to getting data 
written into Hudi. You can also write your own code to capture data fr [...]
+Typically, you obtain a set of partial updates/inserts from your source and 
issue [write operations](https://hudi.apache.org/docs/write_operations/) 
against a Hudi dataset.  If you ingesting data from any of the standard sources 
like Kafka, or tailing DFS, the [Hudi 
Streamer](https://hudi.apache.org/docs/hoodie_deltastreamer#hudi-streamer) tool 
is invaluable and provides an easy, self-managed solution to getting data 
written into Hudi. You can also write your own code to capture data fro [...]
 
 ### How is a Hudi job deployed?
 
@@ -135,13 +135,13 @@ Limitations:
 Note that currently the reading realtime view natively out of the Spark 
datasource is not supported. Please use the Hive path below
 ```
 
-if Hive Sync is enabled in the 
[deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable),
 the dataset is available in Hive as a couple of tables, that can now be read 
using HiveQL, Presto or SparkSQL. See 
[here](https://hudi.apache.org/docs/querying_data/) for more.
+if Hive Sync is enabled in the [Hudi 
Streamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable),
 the dataset is available in Hive as a couple of tables, that can now be read 
using HiveQL, Presto or SparkSQL. See 
[here](https://hudi.apache.org/docs/querying_data/) for more.
 
 ### How does Hudi handle duplicate record keys in an input?
 
 When issuing an `upsert` operation on a dataset and the batch of records 
provided contains multiple entries for a given key, then all of them are 
reduced into a single final value by repeatedly calling payload class's 
[preCombine()](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java#L40)
 method . By default, we pick the record with the greatest value (determined by 
calling .compareTo [...]
 
-For an insert or bulk_insert operation, no such pre-combining is performed. 
Thus, if your input contains duplicates, the dataset would also contain 
duplicates. If you don't want duplicate records either issue an upsert or 
consider specifying option to de-duplicate input in either 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates)
 or 
[deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-utilities/
 [...]
+For an insert or bulk_insert operation, no such pre-combining is performed. 
Thus, if your input contains duplicates, the dataset would also contain 
duplicates. If you don't want duplicate records either issue an upsert or 
consider specifying option to de-duplicate input in either 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates)
 or [Hudi 
Streamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-utilities/
 [...]
 
 ### Can I implement my own logic for how input records are merged with record 
on storage?
 
@@ -214,7 +214,7 @@ Hudi provides built in support for rewriting your entire 
dataset into Hudi one-t
 
 ### How can I pass hudi configurations to my spark job?
 
-Hudi configuration options covering the datasource and low level Hudi write 
client (which both deltastreamer & datasource internally call) are 
[here](https://hudi.apache.org/docs/configurations/). Invoking *--help* on any 
tool such as DeltaStreamer would print all the usage options. A lot of the 
options that control upsert, file sizing behavior are defined at the write 
client level and below is how we pass them to different options available for 
writing data.
+Hudi configuration options covering the datasource and low level Hudi write 
client (which both Hudi Streamer & datasource internally call) are 
[here](https://hudi.apache.org/docs/configurations/). Invoking *--help* on any 
tool such as Hudi Streamer would print all the usage options. A lot of the 
options that control upsert, file sizing behavior are defined at the write 
client level and below is how we pass them to different options available for 
writing data.
 
  - For Spark DataSource, you can use the "options" API of DataFrameWriter to 
pass in these configs. 
 
@@ -227,7 +227,7 @@ inputDF.write().format("org.apache.hudi")
 
  - When using `HoodieWriteClient` directly, you can simply construct 
HoodieWriteConfig object with the configs in the link you mentioned.
 
- - When using HoodieDeltaStreamer tool to ingest, you can set the configs in 
properties file and pass the file as the cmdline argument "*--props*"
+ - When using HoodieStreamer tool to ingest, you can set the configs in 
properties file and pass the file as the cmdline argument "*--props*"
 
 ### How to create Hive style partition folder structure?
 
@@ -253,7 +253,7 @@ set 
hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat
 
 ### Can I register my Hudi dataset with Apache Hive metastore?
 
-Yes. This can be performed either via the standalone [Hive Sync 
tool](https://hudi.apache.org/docs/syncing_metastore#hive-sync-tool) or using 
options in 
[deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable).
+Yes. This can be performed either via the standalone [Hive Sync 
tool](https://hudi.apache.org/docs/syncing_metastore#hive-sync-tool) or using 
options in [Hudi 
Streamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable).
 
 ### How does the Hudi indexing work & what are its benefits? 
 
@@ -277,13 +277,13 @@ The Hudi cleaner process often runs right after a commit 
and deltacommit and goe
 
 ### What's Hudi's schema evolution story?
 
-Hudi uses Avro as the internal canonical representation for records, primarily 
due to its nice [schema compatibility & 
evolution](https://docs.confluent.io/platform/current/schema-registry/avro.html)
 properties. This is a key aspect of having reliability in your ingestion or 
ETL pipelines. As long as the schema passed to Hudi (either explicitly in 
DeltaStreamer schema provider configs or implicitly by Spark Datasource's 
Dataset schemas) is backwards compatible (e.g no field deletes, only [...]
+Hudi uses Avro as the internal canonical representation for records, primarily 
due to its nice [schema compatibility & 
evolution](https://docs.confluent.io/platform/current/schema-registry/avro.html)
 properties. This is a key aspect of having reliability in your ingestion or 
ETL pipelines. As long as the schema passed to Hudi (either explicitly in Hudi 
Streamer schema provider configs or implicitly by Spark Datasource's Dataset 
schemas) is backwards compatible (e.g no field deletes, only [...]
 
 ### How do I run compaction for a MOR dataset?
 
 Simplest way to run compaction on MOR dataset is to run the [compaction 
inline](https://hudi.apache.org/docs/configurations#hoodiecompactinline), at 
the cost of spending more time ingesting; This could be particularly useful, in 
common cases where you have small amount of late arriving data trickling into 
older partitions. In such a scenario, you may want to just aggressively compact 
the last N partitions while waiting for enough logs to accumulate for older 
partitions. The net effect is [...]
 
-That said, for obvious reasons of not blocking ingesting for compaction, you 
may want to run it asynchronously as well. This can be done either via a 
separate [compaction 
job](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java)
 that is scheduled by your workflow scheduler/notebook independently. If you 
are using delta streamer, then you can run in [continuous 
mode](https://github.com/apache/hudi/blob/d3edac4612bde2fa9dec [...]
+That said, for obvious reasons of not blocking ingesting for compaction, you 
may want to run it asynchronously as well. This can be done either via a 
separate [compaction 
job](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java)
 that is scheduled by your workflow scheduler/notebook independently. If you 
are using Hudi Streamer, then you can run in [continuous 
mode](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca [...]
 
 ### What options do I have for asynchronous/offline compactions on MOR dataset?
 
@@ -292,7 +292,7 @@ There are a couple of options depending on how you write to 
Hudi. But first let
 - Execution: In this step the compaction plan is read and file slices are 
compacted. Execution doesnt need the same level of coordination with other 
writers as Scheduling step and can be decoupled from ingestion job easily.
 
 Depending on how you write to Hudi these are the possible options currently.
-- DeltaStreamer:
+- Hudi Streamer:
    - In Continuous mode, asynchronous compaction is achieved by default. Here 
scheduling is done by the ingestion job inline and compaction execution is 
achieved asynchronously by a separate parallel thread.
    - In non continuous mode, only inline compaction is possible. 
    - Please note in either mode, by passing --disable-compaction compaction is 
completely disabled
@@ -359,7 +359,7 @@ b) 
**[Clustering](https://hudi.apache.org/blog/2021/01/27/hudi-clustering-intro)
 
 Hudi runs cleaner to remove old file versions as part of writing data either 
in inline or in asynchronous mode (0.6.0 onwards). Hudi Cleaner retains 
at-least one previous commit when cleaning old file versions. This is to 
prevent the case when concurrently running queries which are reading the latest 
file versions suddenly  see those files getting deleted by cleaner because a 
new file version got added . In other words, retaining at-least one previous 
commit is needed for ensuring snapsh [...]
 
-### How do I use DeltaStreamer or Spark DataSource API to write to a 
Non-partitioned Hudi dataset ?
+### How do I use Hudi Streamer or Spark DataSource API to write to a 
Non-partitioned Hudi dataset ?
 
 Hudi supports writing to non-partitioned datasets. For writing to a 
non-partitioned Hudi dataset and performing hive table syncing, you need to set 
the below configurations in the properties passed:
 
@@ -483,8 +483,8 @@ sudo ln -sf hudi-timeline-server-bundle-0.7.0.jar 
hudi-timeline-server-bundle.ja
 sudo ln -sf hudi-utilities-bundle_2.12-0.7.0.jar hudi-utilities-bundle.jar
 ```
 
-**Using the overriden jar in Deltastreamer:**
-When invoking DeltaStreamer specify the above jar location as part of 
spark-submit command.
+**Using the overriden jar in Hudi Streamer:**
+When invoking Hudi Streamer specify the above jar location as part of 
spark-submit command.
 
 ### Why partition fields are also stored in parquet files in addition to the 
partition path ?
 
diff --git a/website/docs/gcp_bigquery.md b/website/docs/gcp_bigquery.md
index 58ef43435c0..52823d7fc14 100644
--- a/website/docs/gcp_bigquery.md
+++ b/website/docs/gcp_bigquery.md
@@ -9,7 +9,7 @@ now, the Hudi-BigQuery integration only works for hive-style 
partitioned Copy-On
 
 ## Configurations
 
-Hudi uses `org.apache.hudi.gcp.bigquery.BigQuerySyncTool` to sync tables. It 
works with `HoodieDeltaStreamer` via
+Hudi uses `org.apache.hudi.gcp.bigquery.BigQuerySyncTool` to sync tables. It 
works with `HoodieStreamer` via
 setting sync tool class. A few BigQuery-specific configurations are required.
 
 | Config                                       | Notes                         
                                                                                
  |
@@ -33,22 +33,22 @@ hoodie.partition.metafile.use.base.format       = 'true'
 
 ## Example
 
-Below shows an example for running `BigQuerySyncTool` with 
`HoodieDeltaStreamer`.
+Below shows an example for running `BigQuerySyncTool` with `HoodieStreamer`.
 
 ```shell
 spark-submit --master yarn \
 --packages com.google.cloud:google-cloud-bigquery:2.10.4 \
 --jars /opt/hudi-gcp-bundle-0.13.0.jar \
---class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
+--class org.apache.hudi.utilities.streamer.HoodieStreamer \
 /opt/hudi-utilities-bundle_2.12-0.13.0.jar \
 --target-base-path gs://my-hoodie-table/path \
 --target-table mytable \
 --table-type COPY_ON_WRITE \
 --base-file-format PARQUET \
-# ... other deltastreamer options
+# ... other Hudi Streamer options
 --enable-sync \
 --sync-tool-classes org.apache.hudi.gcp.bigquery.BigQuerySyncTool \
---hoodie-conf hoodie.deltastreamer.source.dfs.root=gs://my-source-data/path \
+--hoodie-conf hoodie.streamer.source.dfs.root=gs://my-source-data/path \
 --hoodie-conf hoodie.gcp.bigquery.sync.project_id=hudi-bq \
 --hoodie-conf hoodie.gcp.bigquery.sync.dataset_name=rxusandbox \
 --hoodie-conf hoodie.gcp.bigquery.sync.dataset_location=asia-southeast1 \
diff --git a/website/docs/hoodie_deltastreamer.md 
b/website/docs/hoodie_deltastreamer.md
index bb95bd52b16..9342b5c53bb 100644
--- a/website/docs/hoodie_deltastreamer.md
+++ b/website/docs/hoodie_deltastreamer.md
@@ -1,11 +1,24 @@
 ---
 title: Streaming Ingestion
-keywords: [hudi, deltastreamer, hoodiedeltastreamer]
+keywords: [hudi, streamer, hoodiestreamer]
 ---
 
-## DeltaStreamer
+## Hudi Streamer
+:::danger Breaking Change
 
-The `HoodieDeltaStreamer` utility (part of `hudi-utilities-bundle`) provides 
the way to ingest from different sources such as DFS or Kafka, with the 
following capabilities.
+The following classes were renamed and relocated to 
`org.apache.hudi.utilities.streamer` package.
+- `DeltastreamerMultiWriterCkptUpdateFunc` is renamed to 
`StreamerMultiWriterCkptUpdateFunc`
+- `DeltaSync` is renamed to `StreamSync`
+- `HoodieDeltaStreamer` is renamed to `HoodieStreamer`
+- `HoodieDeltaStreamerMetrics` is renamed to `HoodieStreamerMetrics`
+- `HoodieMultiTableDeltaStreamer` is renamed to `HoodieMultiTableStreamer`
+
+To maintain backward compatiblity, the original classes are still present in 
the org.apache.hudi.utilities.deltastreamer
+package, but have been deprecated.
+
+:::
+
+The `HoodieStreamer` utility (part of `hudi-utilities-bundle`) provides the 
way to ingest from different sources such as DFS or Kafka, with the following 
capabilities.
 
 - Exactly once ingestion of new events from Kafka, [incremental 
imports](https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide#_incremental_imports)
 from Sqoop or output of `HiveIncrementalPuller` or files under a DFS folder
 - Support json, avro or a custom record types for the incoming data
@@ -16,11 +29,11 @@ The `HoodieDeltaStreamer` utility (part of 
`hudi-utilities-bundle`) provides the
 Command line options describe capabilities in more detail
 
 ```java
-[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` --help
+[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.streamer.HoodieStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` --help
 Usage: <main class> [options]
 Options:
     --checkpoint
-      Resume Delta Streamer from this checkpoint.
+      Resume Hudi Streamer from this checkpoint.
     --commit-on-errors
       Commit even when some records failed to be written
       Default: false
@@ -33,7 +46,7 @@ Options:
       https://spark.apache.org/docs/latest/job-scheduling
       Default: 1
     --continuous
-      Delta Streamer runs in continuous mode running source-fetch -> Transform
+      Hudi Streamer runs in continuous mode running source-fetch -> Transform
       -> Hudi Write in loop
       Default: false
     --delta-sync-scheduling-minshare
@@ -91,7 +104,7 @@ Options:
       hoodie client props, sane defaults are used, but recommend use to
       provide basic things like metrics endpoints, hive configs etc. For
       sources, referto individual classes, for supported properties.
-      Default: 
file:///Users/vinoth/bin/hoodie/src/test/resources/delta-streamer-config/dfs-source.properties
+      Default: 
file:///Users/vinoth/bin/hoodie/src/test/resources/streamer-config/dfs-source.properties
     --schemaprovider-class
       subclass of org.apache.hudi.utilities.schema.SchemaProvider to attach
       schemas to input & target table data, built in options:
@@ -135,7 +148,7 @@ Options:
 ```
 
 The tool takes a hierarchically composed property file and has pluggable 
interfaces for extracting data, key generation and providing schema. Sample 
configs for ingesting from kafka and dfs are
-provided under `hudi-utilities/src/test/resources/delta-streamer-config`.
+provided under `hudi-utilities/src/test/resources/streamer-config`.
 
 For e.g: once you have Confluent Kafka, Schema registry up & running, produce 
some test data using 
([impressions.avro](https://docs.confluent.io/current/ksql/docs/tutorials/generate-custom-test-data)
 provided by schema-registry repo)
 
@@ -146,12 +159,12 @@ For e.g: once you have Confluent Kafka, Schema registry 
up & running, produce so
 and then ingest it as follows.
 
 ```java
-[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
-  --props 
file://${PWD}/hudi-utilities/src/test/resources/delta-streamer-config/kafka-source.properties
 \
+[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.streamer.HoodieStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
+  --props 
file://${PWD}/hudi-utilities/src/test/resources/streamer-config/kafka-source.properties
 \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
   --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
   --source-ordering-field impresssiontime \
-  --target-base-path file:\/\/\/tmp/hudi-deltastreamer-op \ 
+  --target-base-path file:\/\/\/tmp/hudi-streamer-op \ 
   --target-table uber.impressions \
   --op BULK_INSERT
 ```
@@ -163,61 +176,61 @@ From 0.11.0 release, we start to provide a new 
`hudi-utilities-slim-bundle` whic
 cause conflicts and compatibility issues with different versions of Spark.  
The `hudi-utilities-slim-bundle` should be
 used along with a Hudi Spark bundle corresponding the Spark version used to 
make utilities work with Spark, e.g.,
 `--packages 
org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.13.0,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.13.0`,
-if using `hudi-utilities-bundle` solely to run `HoodieDeltaStreamer` in Spark 
encounters compatibility issues.
+if using `hudi-utilities-bundle` solely to run `HoodieStreamer` in Spark 
encounters compatibility issues.
 
-#### MultiTableDeltaStreamer
+#### MultiTableStreamer
 
-`HoodieMultiTableDeltaStreamer`, a wrapper on top of `HoodieDeltaStreamer`, 
enables one to ingest multiple tables at a single go into hudi datasets. 
Currently it only supports sequential processing of tables to be ingested and 
COPY_ON_WRITE storage type. The command line options for 
`HoodieMultiTableDeltaStreamer` are pretty much similar to 
`HoodieDeltaStreamer` with the only exception that you are required to provide 
table wise configs in separate files in a dedicated config folder. The [...]
+`HoodieMultiTableStreamer`, a wrapper on top of `HoodieStreamer`, enables one 
to ingest multiple tables at a single go into hudi datasets. Currently it only 
supports sequential processing of tables to be ingested and COPY_ON_WRITE 
storage type. The command line options for `HoodieMultiTableStreamer` are 
pretty much similar to `HoodieStreamer` with the only exception that you are 
required to provide table wise configs in separate files in a dedicated config 
folder. The following command l [...]
 
 ```java
   * --config-folder
     the path to the folder which contains all the table wise config files
     --base-path-prefix
-    this is added to enable users to create all the hudi datasets for related 
tables under one path in FS. The datasets are then created under the path - 
<base_path_prefix>/<database>/<table_to_be_ingested>. However you can override 
the paths for every table by setting the property 
hoodie.deltastreamer.ingestion.targetBasePath
+    this is added to enable users to create all the hudi datasets for related 
tables under one path in FS. The datasets are then created under the path - 
<base_path_prefix>/<database>/<table_to_be_ingested>. However you can override 
the paths for every table by setting the property 
hoodie.streamer.ingestion.targetBasePath
 ```
 
-The following properties are needed to be set properly to ingest data using 
`HoodieMultiTableDeltaStreamer`.
+The following properties are needed to be set properly to ingest data using 
`HoodieMultiTableStreamer`.
 
 ```java
-hoodie.deltastreamer.ingestion.tablesToBeIngested
+hoodie.streamer.ingestion.tablesToBeIngested
   comma separated names of tables to be ingested in the format 
<database>.<table>, for example db1.table1,db1.table2
-hoodie.deltastreamer.ingestion.targetBasePath
+hoodie.streamer.ingestion.targetBasePath
   if you wish to ingest a particular table in a separate path, you can mention 
that path here
-hoodie.deltastreamer.ingestion.<database>.<table>.configFile
+hoodie.streamer.ingestion.<database>.<table>.configFile
   path to the config file in dedicated config folder which contains table 
overridden properties for the particular table to be ingested.
 ```
 
-Sample config files for table wise overridden properties can be found under 
`hudi-utilities/src/test/resources/delta-streamer-config`. The command to run 
`HoodieMultiTableDeltaStreamer` is also similar to how you run 
`HoodieDeltaStreamer`.
+Sample config files for table wise overridden properties can be found under 
`hudi-utilities/src/test/resources/streamer-config`. The command to run 
`HoodieMultiTableStreamer` is also similar to how you run `HoodieStreamer`.
 
 ```java
-[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
-  --props 
file://${PWD}/hudi-utilities/src/test/resources/delta-streamer-config/kafka-source.properties
 \
+[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
+  --props 
file://${PWD}/hudi-utilities/src/test/resources/streamer-config/kafka-source.properties
 \
   --config-folder file://tmp/hudi-ingestion-config \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
   --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
   --source-ordering-field impresssiontime \
-  --base-path-prefix file:\/\/\/tmp/hudi-deltastreamer-op \ 
+  --base-path-prefix file:\/\/\/tmp/hudi-streamer-op \ 
   --target-table uber.impressions \
   --op BULK_INSERT
 ```
 
-For detailed information on how to configure and use 
`HoodieMultiTableDeltaStreamer`, please refer [blog 
section](/blog/2020/08/22/ingest-multiple-tables-using-hudi).
+For detailed information on how to configure and use 
`HoodieMultiTableStreamer`, please refer [blog 
section](/blog/2020/08/22/ingest-multiple-tables-using-hudi).
 
 ### Concurrency Control
 
-The `HoodieDeltaStreamer` utility (part of hudi-utilities-bundle) provides 
ways to ingest from different sources such as DFS or Kafka, with the following 
capabilities.
+The `HoodieStreamer` utility (part of hudi-utilities-bundle) provides ways to 
ingest from different sources such as DFS or Kafka, with the following 
capabilities.
 
-Using optimistic_concurrency_control via delta streamer requires adding the 
above configs to the properties file that can be passed to the
-job. For example below, adding the configs to kafka-source.properties file and 
passing them to deltastreamer will enable optimistic concurrency.
-A deltastreamer job can then be triggered as follows:
+Using optimistic_concurrency_control via Hudi Streamer requires adding the 
above configs to the properties file that can be passed to the
+job. For example below, adding the configs to kafka-source.properties file and 
passing them to Hudi Streamer will enable optimistic concurrency.
+A Hudi Streamer job can then be triggered as follows:
 
 ```java
-[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
-  --props 
file://${PWD}/hudi-utilities/src/test/resources/delta-streamer-config/kafka-source.properties
 \
+[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.streamer.HoodieStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
+  --props 
file://${PWD}/hudi-utilities/src/test/resources/streamer-config/kafka-source.properties
 \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
   --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
   --source-ordering-field impresssiontime \
-  --target-base-path file:\/\/\/tmp/hudi-deltastreamer-op \ 
+  --target-base-path file:\/\/\/tmp/hudi-streamer-op \ 
   --target-table uber.impressions \
   --op BULK_INSERT
 ```
@@ -225,14 +238,14 @@ A deltastreamer job can then be triggered as follows:
 Read more in depth about concurrency control in the [concurrency control 
concepts](/docs/concurrency_control) section
 
 ## Checkpointing
-`HoodieDeltaStreamer` uses checkpoints to keep track of what data has been 
read already so it can resume without needing to reprocess all data.
+`HoodieStreamer` uses checkpoints to keep track of what data has been read 
already so it can resume without needing to reprocess all data.
 When using a Kafka source, the checkpoint is the [Kafka 
Offset](https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management) 
 When using a DFS source, the checkpoint is the 'last modified' timestamp of 
the latest file read.
-Checkpoints are saved in the .hoodie commit file as 
`deltastreamer.checkpoint.key`.
+Checkpoints are saved in the .hoodie commit file as `streamer.checkpoint.key`.
 
 If you need to change the checkpoints for reprocessing or replaying data you 
can use the following options:
 
-- `--checkpoint` will set `deltastreamer.checkpoint.reset_key` in the commit 
file to overwrite the current checkpoint.
+- `--checkpoint` will set `streamer.checkpoint.reset_key` in the commit file 
to overwrite the current checkpoint.
 - `--source-limit` will set a maximum amount of data to read from the source. 
For DFS sources, this is max # of bytes read.
 For Kafka, this is the max # of events to read.
 
@@ -249,35 +262,35 @@ When fetching schemas from a registry, you can specify 
both the source schema an
 
 |Config|Description|Example|
 |---|---|---|
-|hoodie.deltastreamer.schemaprovider.registry.url|The schema of the source you 
are reading from|https://foo:[email protected]|
-|hoodie.deltastreamer.schemaprovider.registry.targetUrl|The schema of the 
target you are writing to|https://foo:[email protected]|
+|hoodie.streamer.schemaprovider.registry.url|The schema of the source you are 
reading from|https://foo:[email protected]|
+|hoodie.streamer.schemaprovider.registry.targetUrl|The schema of the target 
you are writing to|https://foo:[email protected]|
 
-The above configs are passed to DeltaStreamer spark-submit command like: 
-```--hoodie-conf 
hoodie.deltastreamer.schemaprovider.registry.url=https://foo:[email protected]```
+The above configs are passed to Hudi Streamer spark-submit command like: 
+```--hoodie-conf 
hoodie.streamer.schemaprovider.registry.url=https://foo:[email protected]```
 
 ### JDBC Schema Provider
 You can obtain the latest schema through a JDBC connection.
 
 |Config|Description|Example|
 |---|---|---|
-|hoodie.deltastreamer.schemaprovider.source.schema.jdbc.connection.url|The 
JDBC URL to connect to. You can specify source specific connection properties 
in the URL|jdbc:postgresql://localhost/test?user=fred&password=secret|
-|hoodie.deltastreamer.schemaprovider.source.schema.jdbc.driver.type|The class 
name of the JDBC driver to use to connect to this URL|org.h2.Driver|
-|hoodie.deltastreamer.schemaprovider.source.schema.jdbc.username|username for 
the connection|fred|
-|hoodie.deltastreamer.schemaprovider.source.schema.jdbc.password|password for 
the connection|secret|
-|hoodie.deltastreamer.schemaprovider.source.schema.jdbc.dbtable|The table with 
the schema to reference|test_database.test1_table or test1_table|
-|hoodie.deltastreamer.schemaprovider.source.schema.jdbc.timeout|The number of 
seconds the driver will wait for a Statement object to execute to the given 
number of seconds. Zero means there is no limit. In the write path, this option 
depends on how JDBC drivers implement the API setQueryTimeout, e.g., the h2 
JDBC driver checks the timeout of each query instead of an entire JDBC batch. 
It defaults to 0.|0|
-|hoodie.deltastreamer.schemaprovider.source.schema.jdbc.nullable|If true, all 
columns are nullable|true|
+|hoodie.streamer.schemaprovider.source.schema.jdbc.connection.url|The JDBC URL 
to connect to. You can specify source specific connection properties in the 
URL|jdbc:postgresql://localhost/test?user=fred&password=secret|
+|hoodie.streamer.schemaprovider.source.schema.jdbc.driver.type|The class name 
of the JDBC driver to use to connect to this URL|org.h2.Driver|
+|hoodie.streamer.schemaprovider.source.schema.jdbc.username|username for the 
connection|fred|
+|hoodie.streamer.schemaprovider.source.schema.jdbc.password|password for the 
connection|secret|
+|hoodie.streamer.schemaprovider.source.schema.jdbc.dbtable|The table with the 
schema to reference|test_database.test1_table or test1_table|
+|hoodie.streamer.schemaprovider.source.schema.jdbc.timeout|The number of 
seconds the driver will wait for a Statement object to execute to the given 
number of seconds. Zero means there is no limit. In the write path, this option 
depends on how JDBC drivers implement the API setQueryTimeout, e.g., the h2 
JDBC driver checks the timeout of each query instead of an entire JDBC batch. 
It defaults to 0.|0|
+|hoodie.streamer.schemaprovider.source.schema.jdbc.nullable|If true, all 
columns are nullable|true|
 
-The above configs are passed to DeltaStreamer spark-submit command like:
-```--hoodie-conf 
hoodie.deltastreamer.jdbcbasedschemaprovider.connection.url=jdbc:postgresql://localhost/test?user=fred&password=secret```
+The above configs are passed to Hudi Streamer spark-submit command like:
+```--hoodie-conf 
hoodie.streamer.jdbcbasedschemaprovider.connection.url=jdbc:postgresql://localhost/test?user=fred&password=secret```
 
 ### File Based Schema Provider
 You can use a .avsc file to define your schema. You can then point to this 
file on DFS as a schema provider.
 
 |Config|Description|Example|
 |---|---|---|
-|hoodie.deltastreamer.schemaprovider.source.schema.file|The schema of the 
source you are reading from|[example schema 
file](https://github.com/apache/hudi/blob/a8fb69656f522648233f0310ca3756188d954281/docker/demo/config/test-suite/source.avsc)|
-|hoodie.deltastreamer.schemaprovider.target.schema.file|The schema of the 
target you are writing to|[example schema 
file](https://github.com/apache/hudi/blob/a8fb69656f522648233f0310ca3756188d954281/docker/demo/config/test-suite/target.avsc)|
+|hoodie.streamer.schemaprovider.source.schema.file|The schema of the source 
you are reading from|[example schema 
file](https://github.com/apache/hudi/blob/a8fb69656f522648233f0310ca3756188d954281/docker/demo/config/test-suite/source.avsc)|
+|hoodie.streamer.schemaprovider.target.schema.file|The schema of the target 
you are writing to|[example schema 
file](https://github.com/apache/hudi/blob/a8fb69656f522648233f0310ca3756188d954281/docker/demo/config/test-suite/target.avsc)|
 
 
 ### Hive Schema Provider
@@ -285,10 +298,10 @@ You can use hive tables to fetch source and target schema.
 
 |Config| Description                                           |
 |---|-------------------------------------------------------|
-|hoodie.deltastreamer.schemaprovider.source.schema.hive.database| Hive 
database from where source schema can be fetched |
-|hoodie.deltastreamer.schemaprovider.source.schema.hive.table| Hive table from 
where source schema can be fetched    |
-|hoodie.deltastreamer.schemaprovider.target.schema.hive.database| Hive 
database from where target schema can be fetched |
-|hoodie.deltastreamer.schemaprovider.target.schema.hive.table| Hive table from 
where target schema can be fetched    |
+|hoodie.streamer.schemaprovider.source.schema.hive.database| Hive database 
from where source schema can be fetched |
+|hoodie.streamer.schemaprovider.source.schema.hive.table| Hive table from 
where source schema can be fetched    |
+|hoodie.streamer.schemaprovider.target.schema.hive.database| Hive database 
from where target schema can be fetched |
+|hoodie.streamer.schemaprovider.target.schema.hive.table| Hive table from 
where target schema can be fetched    |
 
 
 ### Schema Provider with Post Processor
@@ -297,7 +310,7 @@ then will apply a post processor to change the schema 
before it is used. You can
 this class: 
https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaPostProcessor.java
 
 ## Sources
-Hoodie DeltaStreamer can read data from a wide variety of sources. The 
following are a list of supported sources:
+Hoodie Streamer can read data from a wide variety of sources. The following 
are a list of supported sources:
 
 ### Distributed File System (DFS)
 See the storage configurations page to see some examples of DFS applications 
Hudi can read from. The following are the 
@@ -314,10 +327,10 @@ other formats and then write data as Hudi format.)
 For DFS sources the following behaviors are expected:
 
 - For JSON DFS source, you always need to set a schema. If the target Hudi 
table follows the same schema as from the source file, you just need to set the 
source schema. If not, you need to set schemas for both source and target. 
-- `HoodieDeltaStreamer` reads the files under the source base path 
(`hoodie.deltastreamer.source.dfs.root`) directly, and it won't use the 
partition paths under this base path as fields of the dataset. Detailed 
examples can be found [here](https://github.com/apache/hudi/issues/5485).
+- `HoodieStreamer` reads the files under the source base path 
(`hoodie.streamer.source.dfs.root`) directly, and it won't use the partition 
paths under this base path as fields of the dataset. Detailed examples can be 
found [here](https://github.com/apache/hudi/issues/5485).
 
 ### Kafka
-Hudi can read directly from Kafka clusters. See more details on 
`HoodieDeltaStreamer` to learn how to setup streaming 
+Hudi can read directly from Kafka clusters. See more details on 
`HoodieStreamer` to learn how to setup streaming 
 ingestion with exactly once semantics, checkpointing, and plugin 
transformations. The following formats are supported 
 when reading data from Kafka:
 
@@ -334,9 +347,9 @@ to trigger/processing of new or changed data as soon as it 
is available on S3.
 1. Enable S3 Event Notifications 
https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html
 2. Download the aws-java-sdk-sqs jar. 
 3. Find the queue URL and Region to set these configurations:
-   1. 
hoodie.deltastreamer.s3.source.queue.url=https://sqs.us-west-2.amazonaws.com/queue/url
-   2. hoodie.deltastreamer.s3.source.queue.region=us-west-2 
-4. start the S3EventsSource and S3EventsHoodieIncrSource using the 
`HoodieDeltaStreamer` utility as shown in sample commands below:
+   1. 
hoodie.streamer.s3.source.queue.url=https://sqs.us-west-2.amazonaws.com/queue/url
+   2. hoodie.streamer.s3.source.queue.region=us-west-2 
+4. start the S3EventsSource and S3EventsHoodieIncrSource using the 
`HoodieStreamer` utility as shown in sample commands below:
 
 Insert code sample from this blog: 
https://hudi.apache.org/blog/2021/08/23/s3-events-source/#configuration-and-setup
 
@@ -345,27 +358,27 @@ Hudi can read from a JDBC source with a full fetch of a 
table, or Hudi can even
 
 |Config|Description|Example|
 |---|---|---|
-|hoodie.deltastreamer.jdbc.url|URL of the JDBC 
connection|jdbc:postgresql://localhost/test|
-|hoodie.deltastreamer.jdbc.user|User to use for authentication of the JDBC 
connection|fred|
-|hoodie.deltastreamer.jdbc.password|Password to use for authentication of the 
JDBC connection|secret|
-|hoodie.deltastreamer.jdbc.password.file|If you prefer to use a password file 
for the connection||
-|hoodie.deltastreamer.jdbc.driver.class|Driver class to use for the JDBC 
connection||
-|hoodie.deltastreamer.jdbc.table.name||my_table|
-|hoodie.deltastreamer.jdbc.table.incr.column.name|If run in incremental mode, 
this field will be used to pull new data incrementally||
-|hoodie.deltastreamer.jdbc.incr.pull|Will the JDBC connection perform an 
incremental pull?||
-|hoodie.deltastreamer.jdbc.extra.options.|How you pass extra configurations 
that would normally by specified as 
spark.read.option()|hoodie.deltastreamer.jdbc.extra.options.fetchSize=100 
hoodie.deltastreamer.jdbc.extra.options.upperBound=1 
hoodie.deltastreamer.jdbc.extra.options.lowerBound=100|
-|hoodie.deltastreamer.jdbc.storage.level|Used to control the persistence 
level|Default = MEMORY_AND_DISK_SER|
-|hoodie.deltastreamer.jdbc.incr.fallback.to.full.fetch|Boolean which if set 
true makes an incremental fetch fallback to a full fetch if there is any error 
in the incremental read|FALSE|
+|hoodie.streamer.jdbc.url|URL of the JDBC 
connection|jdbc:postgresql://localhost/test|
+|hoodie.streamer.jdbc.user|User to use for authentication of the JDBC 
connection|fred|
+|hoodie.streamer.jdbc.password|Password to use for authentication of the JDBC 
connection|secret|
+|hoodie.streamer.jdbc.password.file|If you prefer to use a password file for 
the connection||
+|hoodie.streamer.jdbc.driver.class|Driver class to use for the JDBC 
connection||
+|hoodie.streamer.jdbc.table.name||my_table|
+|hoodie.streamer.jdbc.table.incr.column.name|If run in incremental mode, this 
field will be used to pull new data incrementally||
+|hoodie.streamer.jdbc.incr.pull|Will the JDBC connection perform an 
incremental pull?||
+|hoodie.streamer.jdbc.extra.options.|How you pass extra configurations that 
would normally by specified as 
spark.read.option()|hoodie.streamer.jdbc.extra.options.fetchSize=100 
hoodie.streamer.jdbc.extra.options.upperBound=1 
hoodie.streamer.jdbc.extra.options.lowerBound=100|
+|hoodie.streamer.jdbc.storage.level|Used to control the persistence 
level|Default = MEMORY_AND_DISK_SER|
+|hoodie.streamer.jdbc.incr.fallback.to.full.fetch|Boolean which if set true 
makes an incremental fetch fallback to a full fetch if there is any error in 
the incremental read|FALSE|
 
 ### SQL Source
 SQL Source that reads from any table, used mainly for backfill jobs which will 
process specific partition dates. 
-This won't update the deltastreamer.checkpoint.key to the processed commit, 
instead it will fetch the latest successful 
+This won't update the streamer.checkpoint.key to the processed commit, instead 
it will fetch the latest successful 
 checkpoint key and set that value as this backfill commits checkpoint so that 
it won't interrupt the regular incremental 
-processing. To fetch and use the latest incremental checkpoint, you need to 
also set this hoodie_conf for deltastremer 
-jobs: `hoodie.write.meta.key.prefixes = 'deltastreamer.checkpoint.key'`
+processing. To fetch and use the latest incremental checkpoint, you need to 
also set this hoodie_conf for Hudi Streamer 
+jobs: `hoodie.write.meta.key.prefixes = 'streamer.checkpoint.key'`
 
 Spark SQL should be configured using this hoodie config:
-hoodie.deltastreamer.source.sql.sql.query = 'select * from source_table'
+hoodie.streamer.source.sql.sql.query = 'select * from source_table'
 
 ## Flink Ingestion
 
@@ -545,7 +558,7 @@ There are many use cases that user put the full history 
data set onto the messag
 | `write.rate.limit` | `false` | `0` | Default disable the rate limit |
 
 ## Kafka Connect Sink
-If you want to perform streaming ingestion into Hudi format similar to 
`HoodieDeltaStreamer`, but you don't want to depend on Spark,
+If you want to perform streaming ingestion into Hudi format similar to 
`HoodieStreamer`, but you don't want to depend on Spark,
 try out the new experimental release of Hudi Kafka Connect Sink. Read the 
[ReadMe](https://github.com/apache/hudi/tree/master/hudi-kafka-connect) 
 for full documentation.
 
diff --git a/website/docs/key_generation.md b/website/docs/key_generation.md
index 570e086ebe2..0c636a09422 100644
--- a/website/docs/key_generation.md
+++ b/website/docs/key_generation.md
@@ -106,9 +106,9 @@ Configs to be set:
 
 | Config        | Meaning/purpose |       
 | ------------- | -------------|
-| ```hoodie.deltastreamer.keygen.timebased.timestamp.type```    | One of the 
timestamp types supported(UNIX_TIMESTAMP, DATE_STRING, MIXED, 
EPOCHMILLISECONDS, SCALAR) |
-| ```hoodie.deltastreamer.keygen.timebased.output.dateformat```| Output date 
format | 
-| ```hoodie.deltastreamer.keygen.timebased.timezone```| Timezone of the data 
format| 
+| ```hoodie.streamer.keygen.timebased.timestamp.type```    | One of the 
timestamp types supported(UNIX_TIMESTAMP, DATE_STRING, MIXED, 
EPOCHMILLISECONDS, SCALAR) |
+| ```hoodie.streamer.keygen.timebased.output.dateformat```| Output date format 
| 
+| ```hoodie.streamer.keygen.timebased.timezone```| Timezone of the data 
format| 
 | ```oodie.deltastreamer.keygen.timebased.input.dateformat```| Input date 
format |
 
 Let's go over some example values for TimestampBasedKeyGenerator.
@@ -117,9 +117,9 @@ Let's go over some example values for 
TimestampBasedKeyGenerator.
 
 | Config field | Value |
 | ------------- | -------------|
-|```hoodie.deltastreamer.keygen.timebased.timestamp.type```| 
"EPOCHMILLISECONDS"|
-|```hoodie.deltastreamer.keygen.timebased.output.dateformat``` | "yyyy-MM-dd 
hh" |
-|```hoodie.deltastreamer.keygen.timebased.timezone```| "GMT+8:00" |
+|```hoodie.streamer.keygen.timebased.timestamp.type```| "EPOCHMILLISECONDS"|
+|```hoodie.streamer.keygen.timebased.output.dateformat``` | "yyyy-MM-dd hh" |
+|```hoodie.streamer.keygen.timebased.timezone```| "GMT+8:00" |
 
 Input Field value: “1578283932000L” <br/>
 Partition path generated from key generator: “2020-01-06 12”
@@ -131,10 +131,10 @@ Partition path generated from key generator: “1970-01-01 
08”
 
 | Config field | Value |
 | ------------- | -------------|
-|```hoodie.deltastreamer.keygen.timebased.timestamp.type```|  "DATE_STRING"  |
-|```hoodie.deltastreamer.keygen.timebased.output.dateformat```|  "yyyy-MM-dd 
hh" | 
-|```hoodie.deltastreamer.keygen.timebased.timezone```|  "GMT+8:00" |
-|```hoodie.deltastreamer.keygen.timebased.input.dateformat```|  "yyyy-MM-dd 
hh:mm:ss" |
+|```hoodie.streamer.keygen.timebased.timestamp.type```|  "DATE_STRING"  |
+|```hoodie.streamer.keygen.timebased.output.dateformat```|  "yyyy-MM-dd hh" | 
+|```hoodie.streamer.keygen.timebased.timezone```|  "GMT+8:00" |
+|```hoodie.streamer.keygen.timebased.input.dateformat```|  "yyyy-MM-dd 
hh:mm:ss" |
 
 Input field value: “2020-01-06 12:12:12” <br/>
 Partition path generated from key generator: “2020-01-06 12”
@@ -147,10 +147,10 @@ Partition path generated from key generator: “1970-01-01 
12:00:00”
 
 | Config field | Value |
 | ------------- | -------------|
-|```hoodie.deltastreamer.keygen.timebased.timestamp.type```| "SCALAR"|
-|```hoodie.deltastreamer.keygen.timebased.output.dateformat```| "yyyy-MM-dd 
hh" |
-|```hoodie.deltastreamer.keygen.timebased.timezone```| "GMT" |
-|```hoodie.deltastreamer.keygen.timebased.timestamp.scalar.time.unit```| 
"days" |
+|```hoodie.streamer.keygen.timebased.timestamp.type```| "SCALAR"|
+|```hoodie.streamer.keygen.timebased.output.dateformat```| "yyyy-MM-dd hh" |
+|```hoodie.streamer.keygen.timebased.timezone```| "GMT" |
+|```hoodie.streamer.keygen.timebased.timestamp.scalar.time.unit```| "days" |
 
 Input field value: “20000L” <br/>
 Partition path generated from key generator: “2024-10-04 12”
@@ -162,12 +162,12 @@ Partition path generated from key generator: “1970-01-02 
12”
 
 | Config field | Value |
 | ------------- | -------------|
-|```hoodie.deltastreamer.keygen.timebased.timestamp.type```| "DATE_STRING"|
-|```hoodie.deltastreamer.keygen.timebased.input.dateformat```| 
"yyyy-MM-dd'T'HH:mm:ss.SSSZ" |
-|```hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex```|
 "" |
-|```hoodie.deltastreamer.keygen.timebased.input.timezone```| "" |
-|```hoodie.deltastreamer.keygen.timebased.output.dateformat```| "yyyyMMddHH" |
-|```hoodie.deltastreamer.keygen.timebased.output.timezone```| "GMT" |
+|```hoodie.streamer.keygen.timebased.timestamp.type```| "DATE_STRING"|
+|```hoodie.streamer.keygen.timebased.input.dateformat```| 
"yyyy-MM-dd'T'HH:mm:ss.SSSZ" |
+|```hoodie.streamer.keygen.timebased.input.dateformat.list.delimiter.regex```| 
"" |
+|```hoodie.streamer.keygen.timebased.input.timezone```| "" |
+|```hoodie.streamer.keygen.timebased.output.dateformat```| "yyyyMMddHH" |
+|```hoodie.streamer.keygen.timebased.output.timezone```| "GMT" |
 
 Input field value: "2020-04-01T13:01:33.428Z" <br/>
 Partition path generated from key generator: "2020040113"
@@ -176,12 +176,12 @@ Partition path generated from key generator: "2020040113"
 
 | Config field | Value |
 | ------------- | -------------|
-|```hoodie.deltastreamer.keygen.timebased.timestamp.type```| "DATE_STRING"|
-|```hoodie.deltastreamer.keygen.timebased.input.dateformat```| 
"yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ" |
-|```hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex```|
 "" |
-|```hoodie.deltastreamer.keygen.timebased.input.timezone```| "" |
-|```hoodie.deltastreamer.keygen.timebased.output.dateformat```| "yyyyMMddHH" |
-|```hoodie.deltastreamer.keygen.timebased.output.timezone```| "UTC" |
+|```hoodie.streamer.keygen.timebased.timestamp.type```| "DATE_STRING"|
+|```hoodie.streamer.keygen.timebased.input.dateformat```| 
"yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ" |
+|```hoodie.streamer.keygen.timebased.input.dateformat.list.delimiter.regex```| 
"" |
+|```hoodie.streamer.keygen.timebased.input.timezone```| "" |
+|```hoodie.streamer.keygen.timebased.output.dateformat```| "yyyyMMddHH" |
+|```hoodie.streamer.keygen.timebased.output.timezone```| "UTC" |
 
 Input field value: "2020-04-01T13:01:33.428Z" <br/>
 Partition path generated from key generator: "2020040113"
@@ -190,12 +190,12 @@ Partition path generated from key generator: "2020040113"
 
 | Config field | Value |
 | ------------- | -------------|
-|```hoodie.deltastreamer.keygen.timebased.timestamp.type```| "DATE_STRING"|
-|```hoodie.deltastreamer.keygen.timebased.input.dateformat```| 
"yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ" |
-|```hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex```|
 "" |
-|```hoodie.deltastreamer.keygen.timebased.input.timezone```| "" |
-|```hoodie.deltastreamer.keygen.timebased.output.dateformat```| "yyyyMMddHH" |
-|```hoodie.deltastreamer.keygen.timebased.output.timezone```| "UTC" |
+|```hoodie.streamer.keygen.timebased.timestamp.type```| "DATE_STRING"|
+|```hoodie.streamer.keygen.timebased.input.dateformat```| 
"yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ" |
+|```hoodie.streamer.keygen.timebased.input.dateformat.list.delimiter.regex```| 
"" |
+|```hoodie.streamer.keygen.timebased.input.timezone```| "" |
+|```hoodie.streamer.keygen.timebased.output.dateformat```| "yyyyMMddHH" |
+|```hoodie.streamer.keygen.timebased.output.timezone```| "UTC" |
 
 Input field value: "2020-04-01T13:01:33-**05:00**" <br/>
 Partition path generated from key generator: "2020040118"
@@ -204,12 +204,12 @@ Partition path generated from key generator: "2020040118"
 
 | Config field | Value |
 | ------------- | -------------|
-|```hoodie.deltastreamer.keygen.timebased.timestamp.type```| "DATE_STRING"|
-|```hoodie.deltastreamer.keygen.timebased.input.dateformat```| 
"yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ,yyyyMMdd" |
-|```hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex```|
 "" |
-|```hoodie.deltastreamer.keygen.timebased.input.timezone```| "UTC" |
-|```hoodie.deltastreamer.keygen.timebased.output.dateformat```| "MM/dd/yyyy" |
-|```hoodie.deltastreamer.keygen.timebased.output.timezone```| "UTC" |
+|```hoodie.streamer.keygen.timebased.timestamp.type```| "DATE_STRING"|
+|```hoodie.streamer.keygen.timebased.input.dateformat```| 
"yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ,yyyyMMdd" |
+|```hoodie.streamer.keygen.timebased.input.dateformat.list.delimiter.regex```| 
"" |
+|```hoodie.streamer.keygen.timebased.input.timezone```| "UTC" |
+|```hoodie.streamer.keygen.timebased.output.dateformat```| "MM/dd/yyyy" |
+|```hoodie.streamer.keygen.timebased.output.timezone```| "UTC" |
 
 Input field value: "20200401" <br/>
 Partition path generated from key generator: "04/01/2020"
diff --git a/website/docs/metadata_indexing.md 
b/website/docs/metadata_indexing.md
index 3a785c41d6c..85a10dcec8d 100644
--- a/website/docs/metadata_indexing.md
+++ b/website/docs/metadata_indexing.md
@@ -25,7 +25,7 @@ feature, please check out [this 
blog](https://www.onehouse.ai/blog/asynchronous-
 
 ## Setup Async Indexing
 
-First, we will generate a continuous workload. In the below example, we are 
going to start a [deltastreamer](/docs/hoodie_deltastreamer#deltastreamer) 
which will continuously write data
+First, we will generate a continuous workload. In the below example, we are 
going to start a [Hudi Streamer](/docs/hoodie_deltastreamer#hudi-streamer) 
which will continuously write data
 from raw parquet to Hudi table. We used the widely available [NY Taxi 
dataset](https://registry.opendata.aws/nyc-tlc-trip-records-pds/), whose setup 
details are as below:
 <details>
   <summary>Ingestion write config</summary>
@@ -35,9 +35,9 @@ from raw parquet to Hudi table. We used the widely available 
[NY Taxi dataset](h
 hoodie.datasource.write.recordkey.field=VendorID
 hoodie.datasource.write.partitionpath.field=tpep_dropoff_datetime
 hoodie.datasource.write.precombine.field=tpep_dropoff_datetime
-hoodie.deltastreamer.source.dfs.root=/Users/home/path/to/data/parquet_files/
-hoodie.deltastreamer.schemaprovider.target.schema.file=/Users/home/path/to/schema/schema.avsc
-hoodie.deltastreamer.schemaprovider.source.schema.file=/Users/home/path/to/schema/schema.avsc
+hoodie.streamer.source.dfs.root=/Users/home/path/to/data/parquet_files/
+hoodie.streamer.schemaprovider.target.schema.file=/Users/home/path/to/schema/schema.avsc
+hoodie.streamer.schemaprovider.source.schema.file=/Users/home/path/to/schema/schema.avsc
 // set lock provider configs
 
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
 hoodie.write.lock.zookeeper.url=<zk_url>
@@ -50,12 +50,12 @@ hoodie.write.lock.zookeeper.base_path=<zk_base_path>
 </details>
 
 <details>
-  <summary>Run deltastreamer</summary>
+  <summary>Run Hudi Streamer</summary>
 <p>
 
 ```bash
 spark-submit \
---class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls 
/Users/home/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.0.jar`
 \
+--class org.apache.hudi.utilities.streamer.HoodieStreamer `ls 
/Users/home/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.0.jar`
 \
 --props `ls /Users/home/path/to/write/config.properties` \
 --source-class org.apache.hudi.utilities.sources.ParquetDFSSource  
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
\
 --source-ordering-field tpep_dropoff_datetime   \
@@ -71,7 +71,7 @@ spark-submit \
 </p>
 </details>
 
-From version 0.11.0 onwards, Hudi metadata table is enabled by default and the 
files index will be automatically created. While the deltastreamer is running 
in continuous mode, let
+From version 0.11.0 onwards, Hudi metadata table is enabled by default and the 
files index will be automatically created. While the Hudi Streamer is running 
in continuous mode, let
 us schedule the indexing for COLUMN_STATS index. First we need to define a 
properties file for the indexer.
 
 ### Configurations
diff --git a/website/docs/metrics.md b/website/docs/metrics.md
index eddba2a45d0..c6f2833f5ad 100644
--- a/website/docs/metrics.md
+++ b/website/docs/metrics.md
@@ -73,7 +73,7 @@ hoodie.metrics.datadog.metric.prefix=<your metrics prefix>
  * `hoodie.metrics.datadog.metric.prefix` will help segregate metrics by 
setting different prefixes for different jobs. Note that it will use `.` to 
delimit the prefix and the metric name. For example, if the prefix is set to 
`foo`, then `foo.` will be prepended to the metric name.
 
 #### Demo
-In this demo, we ran a `HoodieDeltaStreamer` job with `HoodieMetrics` turned 
on and other configurations set properly.
+In this demo, we ran a `HoodieStreamer` job with `HoodieMetrics` turned on and 
other configurations set properly.
 
 <figure>
     <img className="docimage" 
src={require("/assets/images/blog/2020-05-28-datadog-metrics-demo.png").default}
 alt="hudi_datadog_metrics.png"  />
@@ -85,7 +85,7 @@ In this demo, we ran a `HoodieDeltaStreamer` job with 
`HoodieMetrics` turned on
  * `<prefix>.<table name>.clean.duration`
  * `<prefix>.<table name>.index.lookup.duration`
 
- as well as `HoodieDeltaStreamer`-specific metrics
+ as well as `HoodieStreamer`-specific metrics
 
  * `<prefix>.<table name>.deltastreamer.duration`
  * `<prefix>.<table name>.deltastreamer.hiveSyncDuration`
diff --git a/website/docs/migration_guide.md b/website/docs/migration_guide.md
index 449d65c376a..31cbdbcb956 100644
--- a/website/docs/migration_guide.md
+++ b/website/docs/migration_guide.md
@@ -36,15 +36,15 @@ Import your existing table into a Hudi managed table. Since 
all the data is Hudi
 There are a few options when choosing this approach.
 
 **Option 1**
-Use the HoodieDeltaStreamer tool. HoodieDeltaStreamer supports bootstrap with 
--run-bootstrap command line option. There are two types of bootstrap,
+Use the HoodieStreamer tool. HoodieStreamer supports bootstrap with 
--run-bootstrap command line option. There are two types of bootstrap,
 METADATA_ONLY and FULL_RECORD. METADATA_ONLY will generate just skeleton base 
files with keys/footers, avoiding full cost of rewriting the dataset.
 FULL_RECORD will perform a full copy/rewrite of the data as a Hudi table.
 
-Here is an example for running FULL_RECORD bootstrap and keeping hive style 
partition with HoodieDeltaStreamer.
+Here is an example for running FULL_RECORD bootstrap and keeping hive style 
partition with HoodieStreamer.
 ```
 spark-submit --master local \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
---class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
+--class org.apache.hudi.utilities.streamer.HoodieStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
 --run-bootstrap \
 --target-base-path /tmp/hoodie/bootstrap_table \
 --target-table bootstrap_table \
diff --git a/website/docs/precommit_validator.md 
b/website/docs/precommit_validator.md
index f7466002f89..6dbd3354305 100644
--- a/website/docs/precommit_validator.md
+++ b/website/docs/precommit_validator.md
@@ -5,7 +5,7 @@ keywords: [ hudi, quality, expectations, pre-commit validator]
 
 Data quality refers to the overall accuracy, completeness, consistency, and 
validity of data. Ensuring data quality is vital for accurate analysis and 
reporting, as well as for compliance with regulations and maintaining trust in 
your organization's data infrastructure.
 
-Hudi offers **Pre-Commit Validators** that allow you to ensure that your data 
meets certain data quality expectations as you are writing with DeltaStreamer 
or Spark Datasource writers.
+Hudi offers **Pre-Commit Validators** that allow you to ensure that your data 
meets certain data quality expectations as you are writing with Hudi Streamer 
or Spark Datasource writers.
 
 To configure pre-commit validators, use this setting 
`hoodie.precommit.validators=<comma separated list of validator class names>`.
 
diff --git a/website/docs/querying_data.md b/website/docs/querying_data.md
index b743fec3893..f0f72f267ee 100644
--- a/website/docs/querying_data.md
+++ b/website/docs/querying_data.md
@@ -226,7 +226,7 @@ table like any other Hive table.
 ### Incremental query
 `HiveIncrementalPuller` allows incrementally extracting changes from large 
fact/dimension tables via HiveQL, combining the benefits of Hive (reliably 
process complex SQL queries) and
 incremental primitives (speed up querying tables incrementally instead of 
scanning fully). The tool uses Hive JDBC to run the hive query and saves its 
results in a temp table.
-that can later be upserted. Upsert utility (`HoodieDeltaStreamer`) has all the 
state it needs from the directory structure to know what should be the commit 
time on the target table.
+that can later be upserted. Upsert utility (`HoodieStreamer`) has all the 
state it needs from the directory structure to know what should be the commit 
time on the target table.
 e.g: 
`/app/incremental-hql/intermediate/{source_table_name}_temp/{last_commit_included}`.The
 Delta Hive table registered will be of the form 
`{tmpdb}.{source_table}_{last_commit_included}`.
 
 The following are the configuration options for HiveIncrementalPuller
diff --git a/website/docs/quick-start-guide.md 
b/website/docs/quick-start-guide.md
index 8de81f2856f..a23ce275394 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -328,7 +328,7 @@ location '/tmp/hudi/hudi_cow_pt_tbl';
 
 **Create Table for an existing Hudi Table**
 
-We can create a table on an existing hudi table(created with spark-shell or 
deltastreamer). This is useful to
+We can create a table on an existing hudi table(created with spark-shell or 
Hudi Streamer). This is useful to
 read/write to/from a pre-existing hudi table.
 
 ```sql
diff --git a/website/docs/s3_hoodie.md b/website/docs/s3_hoodie.md
index 044b201c7cb..37f79ae7534 100644
--- a/website/docs/s3_hoodie.md
+++ b/website/docs/s3_hoodie.md
@@ -62,7 +62,7 @@ Alternatively, add the required configs in your core-site.xml 
from where Hudi ca
 ```
 
 
-Utilities such as hudi-cli or deltastreamer tool, can pick up s3 creds via 
environmental variable prefixed with `HOODIE_ENV_`. For e.g below is a bash 
snippet to setup
+Utilities such as hudi-cli or Hudi Streamer tool, can pick up s3 creds via 
environmental variable prefixed with `HOODIE_ENV_`. For e.g below is a bash 
snippet to setup
 such variables and then have cli be able to work on datasets stored in s3
 
 ```java
diff --git a/website/docs/syncing_aws_glue_data_catalog.md 
b/website/docs/syncing_aws_glue_data_catalog.md
index 0d9075993ec..3ab47deeab7 100644
--- a/website/docs/syncing_aws_glue_data_catalog.md
+++ b/website/docs/syncing_aws_glue_data_catalog.md
@@ -10,7 +10,7 @@ and send them to AWS Glue.
 ### Configurations
 
 There is no additional configuration for using `AwsGlueCatalogSyncTool`; you 
just need to set it as one of the sync tool
-classes for `HoodieDeltaStreamer` and everything configured as shown in [Sync 
to Hive Metastore](syncing_metastore) will
+classes for `HoodieStreamer` and everything configured as shown in [Sync to 
Hive Metastore](syncing_metastore) will
 be passed along.
 
 ```shell
diff --git a/website/docs/syncing_datahub.md b/website/docs/syncing_datahub.md
index da7fdc876c6..40fcd1d1891 100644
--- a/website/docs/syncing_datahub.md
+++ b/website/docs/syncing_datahub.md
@@ -7,7 +7,7 @@ keywords: [hudi, datahub, sync]
 obeservability, federated governance, etc.
 
 Since Hudi 0.11.0, you can now sync to a DataHub instance by setting 
`DataHubSyncTool` as one of the sync tool classes
-for `HoodieDeltaStreamer`.
+for `HoodieStreamer`.
 
 The target Hudi table will be sync'ed to DataHub as a `Dataset`. The Hudi 
table's avro schema will be sync'ed, along
 with the commit timestamp when running the sync.
@@ -29,18 +29,18 @@ the URN creation.
 
 ### Example
 
-The following shows an example configuration to run `HoodieDeltaStreamer` with 
`DataHubSyncTool`.
+The following shows an example configuration to run `HoodieStreamer` with 
`DataHubSyncTool`.
 
-In addition to `hudi-utilities-bundle` that contains `HoodieDeltaStreamer`, 
you also add `hudi-datahub-sync-bundle` to
+In addition to `hudi-utilities-bundle` that contains `HoodieStreamer`, you 
also add `hudi-datahub-sync-bundle` to
 the classpath.
 
 ```shell
 spark-submit --master yarn \
 --jars /opt/hudi-datahub-sync-bundle-0.13.0.jar \
---class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
+--class org.apache.hudi.utilities.streamer.HoodieStreamer \
 /opt/hudi-utilities-bundle_2.12-0.13.0.jar \
 --target-table mytable \
-# ... other HoodieDeltaStreamer's configs
+# ... other HoodieStreamer's configs
 --enable-sync \
 --sync-tool-classes org.apache.hudi.sync.datahub.DataHubSyncTool \
 --hoodie-conf 
hoodie.meta.sync.datahub.emitter.server=http://url-to-datahub-instance:8080 \
diff --git a/website/docs/syncing_metastore.md 
b/website/docs/syncing_metastore.md
index 10e6b9b8886..d1600be3967 100644
--- a/website/docs/syncing_metastore.md
+++ b/website/docs/syncing_metastore.md
@@ -5,7 +5,7 @@ keywords: [hudi, hive, sync]
 
 ## Hive Sync Tool
 
-Writing data with [DataSource](/docs/writing_data) writer or 
[HoodieDeltaStreamer](/docs/hoodie_deltastreamer) supports syncing of the 
table's latest schema to Hive metastore, such that queries can pick up new 
columns and partitions.
+Writing data with [DataSource](/docs/writing_data) writer or 
[HoodieStreamer](/docs/hoodie_deltastreamer) supports syncing of the table's 
latest schema to Hive metastore, such that queries can pick up new columns and 
partitions.
 In case, it's preferable to run this from commandline or in an independent 
jvm, Hudi provides a `HiveSyncTool`, which can be invoked as below,
 once you have built the hudi-hive module. Following is how we sync the above 
Datasource Writer written table to Hive metastore.
 
diff --git a/website/docs/transforms.md b/website/docs/transforms.md
index 4b594f7b801..e03f58b0ece 100644
--- a/website/docs/transforms.md
+++ b/website/docs/transforms.md
@@ -12,12 +12,12 @@ You can pass a SQL Query to be executed during write.
 
 ```scala
 --transformer-class 
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
---hoodie-conf hoodie.deltastreamer.transformer.sql=SELECT a.col1, a.col3, 
a.col4 FROM <SRC> a
+--hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4 
FROM <SRC> a
 ```
 
 ### SQL File Transformer
 You can specify a File with a SQL script to be executed during write. The SQL 
file is configured with this hoodie property:
-hoodie.deltastreamer.transformer.sql.file
+hoodie.streamer.transformer.sql.file
 
 The query should reference the source as a table named "\<SRC\>"
  
@@ -51,7 +51,7 @@ If you wish to use multiple transformers together, you can 
use the Chained trans
 Example below first flattens the incoming records and then does sql projection 
based on the query specified:
 ```scala
 --transformer-class 
org.apache.hudi.utilities.transform.FlatteningTransformer,org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
   
---hoodie-conf hoodie.deltastreamer.transformer.sql=SELECT a.col1, a.col3, 
a.col4 FROM <SRC> a
+--hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4 
FROM <SRC> a
 ```
 
 ### AWS DMS Transformer
diff --git a/website/docs/use_cases.md b/website/docs/use_cases.md
index f3fabdf04d4..4efb3bc4736 100644
--- a/website/docs/use_cases.md
+++ b/website/docs/use_cases.md
@@ -29,7 +29,7 @@ are needed if ingestion is to keep up with the typically high 
update volumes.
 Even for immutable data sources like [Kafka](https://kafka.apache.org), there 
is often a need to de-duplicate the incoming events against what's stored on 
DFS.
 Hudi achieves this by [employing 
indexes](http://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/) of 
different kinds, quickly and efficiently.
 
-All of this is seamlessly achieved by the Hudi DeltaStreamer tool, which is 
maintained in tight integration with rest of the code 
+All of this is seamlessly achieved by the Hudi Streamer tool, which is 
maintained in tight integration with rest of the code 
 and we are always trying to add more capture sources, to make this easier for 
the users. The tool also has a continuous mode, where it
 can self-manage clustering/compaction asynchronously, without blocking 
ingestion, significantly improving data freshness.
 
diff --git a/website/docs/write_operations.md b/website/docs/write_operations.md
index 746a93d057b..fc0791bcf20 100644
--- a/website/docs/write_operations.md
+++ b/website/docs/write_operations.md
@@ -32,7 +32,7 @@ Hudi supports implementing two types of deletes on data 
stored in Hudi tables, b
 - **Hard Deletes** : A stronger form of deletion is to physically remove any 
trace of the record from the table. This can be achieved in 3 different ways. 
   - Using DataSource, set `OPERATION_OPT_KEY` to `DELETE_OPERATION_OPT_VAL`. 
This will remove all the records in the DataSet being submitted. 
   - Using DataSource, set `PAYLOAD_CLASS_OPT_KEY` to 
`"org.apache.hudi.EmptyHoodieRecordPayload"`. This will remove all the records 
in the DataSet being submitted. 
-  - Using DataSource or DeltaStreamer, add a column named `_hoodie_is_deleted` 
to DataSet. The value of this column must be set to `true` for all the records 
to be deleted and either `false` or left null for any records which are to be 
upserted.
+  - Using DataSource or Hudi Streamer, add a column named `_hoodie_is_deleted` 
to DataSet. The value of this column must be set to `true` for all the records 
to be deleted and either `false` or left null for any records which are to be 
upserted.
 
 ## Writing path
 The following is an inside look on the Hudi write path and the sequence of 
events that occur during a write.
diff --git a/website/docs/writing_data.md b/website/docs/writing_data.md
index 51679a6cc88..3cc0f3d2264 100644
--- a/website/docs/writing_data.md
+++ b/website/docs/writing_data.md
@@ -9,7 +9,7 @@ import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
 In this section, we will cover ways to ingest new changes from external 
sources or even other Hudi tables.
-The two main tools available are the 
[DeltaStreamer](/docs/hoodie_deltastreamer#deltastreamer) tool, as well as the 
[Spark Hudi datasource](#spark-datasource-writer).
+The two main tools available are the [Hudi 
Streamer](/docs/hoodie_deltastreamer#hudi-streamer) tool, as well as the [Spark 
Hudi datasource](#spark-datasource-writer).
 
 ## Spark Datasource Writer

[hudi] branch asf-site updated: [HUDI-6520] [DOCS] Rename Deltastreamer and related classes and configs (#9179)

Reply via email to