pratyakshsharma commented on code in PR #4927:
URL: https://github.com/apache/hudi/pull/4927#discussion_r867338139
##########
website/docs/basic_configurations.md:
##########
@@ -0,0 +1,750 @@
+---
+title: Basic Configurations
+toc: true
+---
+
+This page covers the basic configurations you may use to write/read Hudi
tables. This page only features a subset of the
+most frequently used configurations. For a full list of all configs, please
visit the [All Configurations](/docs/configurations) page.
+
+- [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the
Hudi Spark Datasource, providing ability to define keys/partitioning, pick out
the write operation, specify how to merge records or choosing query type to
read.
+- [**Flink Sql Configs**](#FLINK_SQL): These configs control the Hudi Flink
SQL source/sink connectors, providing ability to define record keys, pick out
the write operation, specify how to merge records, enable/disable asynchronous
compaction or choosing query type to read.
+- [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource
uses a RDD based HoodieWriteClient API to actually perform writes to storage.
These configs provide deep control over lower level aspects like file sizing,
compression, parallelism, compaction, write schema, cleaning etc. Although Hudi
provides sane defaults, from time-time these configs may need to be tweaked to
optimize for specific workloads.
+- [**Metrics Configs**](#METRICS): These set of configs are used to enable
monitoring and reporting of keyHudi stats and metrics.
+- [**Record Payload Config**](#RECORD_PAYLOAD): This is the lowest level of
customization offered by Hudi. Record payloads define how to produce new values
to upsert based on incoming new record and stored old record. Hudi provides
default implementations such as OverwriteWithLatestAvroPayload which simply
update table with the latest/last-written record. This can be overridden to a
custom class extending HoodieRecordPayload class, on both datasource and
WriteClient levels.
+
+## Spark Datasource Configs {#SPARK_DATASOURCE}
+These configs control the Hudi Spark Datasource, providing ability to define
keys/partitioning, pick out the write operation, specify how to merge records
or choosing query type to read.
+
+### Read Options {#Read-Options}
+
+Options useful for reading tables via `read.format.option(...)`
+
+
+`Config Class`: org.apache.hudi.DataSourceOptions.scala<br></br>
+> #### hoodie.datasource.query.type
+> Whether data needs to be read, in incremental mode (new data since an
instantTime) (or) Read Optimized mode (obtain latest view, based on base files)
(or) Snapshot mode (obtain latest view, by merging base and (if any) log
files)<br></br>
+> **Default Value**: snapshot (Optional)<br></br>
+> `Config Param: QUERY_TYPE`<br></br>
+
+---
+
+### Write Options {#Write-Options}
+
+You can pass down any of the WriteClient level configs directly using
`options()` or `option(k,v)` methods.
+
+```java
+inputDF.write()
+.format("org.apache.hudi")
+.options(clientOpts) // any of the Hudi client opts can be passed in as well
+.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
+.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "partition")
+.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
+.option(HoodieWriteConfig.TABLE_NAME, tableName)
+.mode(SaveMode.Append)
+.save(basePath);
+```
+
+Options useful for writing tables via `write.format.option(...)`
+
+
+`Config Class`: org.apache.hudi.DataSourceOptions.scala<br></br>
+
+> #### hoodie.datasource.write.operation
+> Whether to do upsert, insert or bulkinsert for the write operation. Use
bulkinsert to load new data into a table, and there on use upsert/insert. bulk
insert uses a disk based write path to scale to load large inputs without need
to cache it.<br></br>
Review Comment:
there on -> thereafter?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]