[GitHub] [hudi] xushiyan commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql

GitBox Thu, 09 Dec 2021 22:42:42 -0800


xushiyan commented on a change in pull request #4269:
URL: https://github.com/apache/hudi/pull/4269#discussion_r766367063




##########
File path: website/docs/quick-start-guide.md
##########
@@ -175,18 +175,163 @@ values={[
 </TabItem>
 <TabItem value="sparksql">
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.

Review comment:
       ```suggestion
     Both of Hudi's table types (Copy-On-Write (COW) and Merge-On-Read (MOR)) 
can be created using Spark SQL.
   ```

##########
File path: website/docs/quick-start-guide.md
##########
@@ -175,18 +175,163 @@ values={[
 </TabItem>
 <TabItem value="sparksql">
 
+Spark-sql needs an explicit create table command.

Review comment:
       ```suggestion
   Spark SQL needs an explicit create table command.
   ```

##########
File path: website/docs/quick-start-guide.md
##########
@@ -175,18 +175,163 @@ values={[
 </TabItem>
 <TabItem value="sparksql">
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.

Review comment:
       ```suggestion
     Users can create a partitioned table or a non-partitioned table in Spark 
SQL.
   ```

##########
File path: website/docs/quick-start-guide.md
##########
@@ -175,18 +175,163 @@ values={[
 </TabItem>
 <TabItem value="sparksql">
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.
+  One needs to set **primaryKey** column in options to create a primary key 
table.
+  If you are using any of the built-in key generators in Hudi, likely it is a 
primary key table.
+
+Let's go over some of the create table commands.
+
+**Create a Non-Partitioned Table**
+
 ```sql
--- 
-create table if not exists hudi_table2(
-  id int, 
-  name string, 
+-- create a cow table, with default primaryKey 'uuid' and without 
preCombineField provided
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
   price double
+) using hudi;
+
+
+-- create a mor non-partitioned table without preCombineField provided
+create table hudi_mor_tbl (
+  id int,
+  name string,
+  price double,
+  ts bigint
 ) using hudi
-options (
-  type = 'cow'
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
 );
 ```
 
+Here is an example of creating an external COW partitioned table.
+
+**Create Partitioned Table**
+
+```sql
+-- create a partitioned, preCombineField-provided cow table
+create table hudi_cow_pt_tbl (
+  id bigint,
+  name string,
+  ts bigint,
+  dt string,
+  hh string
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (dt, hh)
+location '/tmp/hudi/hudi_cow_pt_tbl';
+```
+
+**Create Table for an existing Hudi Table**
+
+We can create a table on an existing hudi table(created with spark-shell or 
deltastreamer). This is useful to
+read/write to/from a pre-existing hudi table.
+
+```sql
+-- create an external hudi table based on an existing path
+
+-- for non-partitioned table
+create table hudi_existing_tbl0 using hudi
+location 'file:///tmp/hudi/dataframe_hudi_nonpt_table';
+
+-- for partitioned table
+create table hudi_existing_tbl1 using hudi
+partitioned by (dt, hh)
+location 'file:///tmp/hudi/dataframe_hudi_pt_table';
+```
+
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.
+:::
+
+**CTAS**
+
+Hudi supports CTAS(Create Table As Select) on spark sql. <br/>
+Note: For better performance to load data to hudi table, CTAS uses the **bulk 
insert** as the write operation.
+
+Example CTAS command to create a non-partitioned COW table without 
preCombineField.
+
+```sql
+-- CTAS: create a non-partitioned cow table without preCombineField
+create table hudi_ctas_cow_nonpcf_tbl
+using hudi
+tblproperties (primaryKey = 'id')
+as
+select 1 as id, 'a1' as name, 10 as price;
+```
+
+Example CTAS command to create a partitioned, primary key COW table.
+
+```sql
+-- CTAS: create a partitioned, preCombineField-provided cow table
+create table hudi_ctas_cow_pt_tbl
+using hudi
+tblproperties (type = 'cow', primaryKey = 'id', preCombineField = 'ts')
+partitioned by (dt)
+as
+select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-12-01' as dt;
+
+```
+
+Example CTAS command to load data from another table.
+
+```sql
+# create managed parquet table
+create table parquet_mngd using parquet location 
'file:///tmp/parquet_dataset/*.parquet';
+
+# CTAS by loading data into hudi table
+create table hudi_ctas_cow_pt_tbl2 using hudi location 
'file:/tmp/hudi/hudi_tbl/' options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (datestr) as select * from parquet_mngd;
+```
+
+**Create Table Properties**
+
+Users can set table properties while creating a hudi table. Critical options 
are listed here.
+
+| Parameter Name | Default | Introduction |
+|------------------|--------|------------|
+| primaryKey | uuid | The primary key names of the table, multiple fields 
separated by commas. Same as `hoodie.datasource.write.recordkey.field` |
+| preCombineField |  | The pre-combine field of the table. Same as 
`hoodie.datasource.write.precombine.field` |
+| type       | cow | The table type to create. type = 'cow' means a 
COPY-ON-WRITE table, while type = 'mor' means a MERGE-ON-READ table. Same as 
`hoodie.datasource.write.table.type` |
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .
+
+:::note
+1. Since hudi 0.10.0, `primaryKey` is required to specify. It can align with 
Hudi datasource writer’s and resolve many behavioural discrepancies reported in 
previous versions.
+2. `primaryKey`, `preCombineField`, `type` is case sensitive.
+3. To specify `primaryKey`, `preCombineField`, `type` or other hudi configs, 
`tblproperties` is a preferred way than `options`. Spark SQL syntax is detailed 
here.
+4. A new hudi table created by spark-sql will set 
`hoodie.table.keygenerator.class` as 
`org.apache.hudi.keygen.ComplexKeyGenerator`,
+`hoodie.datasource.write.hive_style_partitioning` as `true` by default.
+:::

Review comment:
       great notes section here. do you want to move to the beginning of the 
Spark SQL section to highlight these before users scoll long way down here? 
below "Table Types" ?

##########
File path: website/docs/quick-start-guide.md
##########
@@ -175,18 +175,163 @@ values={[
 </TabItem>
 <TabItem value="sparksql">
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.
+  One needs to set **primaryKey** column in options to create a primary key 
table.
+  If you are using any of the built-in key generators in Hudi, likely it is a 
primary key table.
+
+Let's go over some of the create table commands.
+
+**Create a Non-Partitioned Table**
+
 ```sql
--- 
-create table if not exists hudi_table2(
-  id int, 
-  name string, 
+-- create a cow table, with default primaryKey 'uuid' and without 
preCombineField provided
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
   price double
+) using hudi;
+
+
+-- create a mor non-partitioned table without preCombineField provided
+create table hudi_mor_tbl (
+  id int,
+  name string,
+  price double,
+  ts bigint
 ) using hudi
-options (
-  type = 'cow'
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
 );
 ```
 
+Here is an example of creating an external COW partitioned table.
+
+**Create Partitioned Table**
+
+```sql
+-- create a partitioned, preCombineField-provided cow table
+create table hudi_cow_pt_tbl (
+  id bigint,
+  name string,
+  ts bigint,
+  dt string,
+  hh string
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (dt, hh)
+location '/tmp/hudi/hudi_cow_pt_tbl';
+```
+
+**Create Table for an existing Hudi Table**
+
+We can create a table on an existing hudi table(created with spark-shell or 
deltastreamer). This is useful to
+read/write to/from a pre-existing hudi table.
+
+```sql
+-- create an external hudi table based on an existing path
+
+-- for non-partitioned table
+create table hudi_existing_tbl0 using hudi
+location 'file:///tmp/hudi/dataframe_hudi_nonpt_table';
+
+-- for partitioned table
+create table hudi_existing_tbl1 using hudi
+partitioned by (dt, hh)
+location 'file:///tmp/hudi/dataframe_hudi_pt_table';
+```
+
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.
+:::
+
+**CTAS**
+
+Hudi supports CTAS(Create Table As Select) on spark sql. <br/>
+Note: For better performance to load data to hudi table, CTAS uses the **bulk 
insert** as the write operation.
+
+Example CTAS command to create a non-partitioned COW table without 
preCombineField.
+
+```sql
+-- CTAS: create a non-partitioned cow table without preCombineField
+create table hudi_ctas_cow_nonpcf_tbl
+using hudi
+tblproperties (primaryKey = 'id')
+as
+select 1 as id, 'a1' as name, 10 as price;
+```
+
+Example CTAS command to create a partitioned, primary key COW table.
+
+```sql
+-- CTAS: create a partitioned, preCombineField-provided cow table
+create table hudi_ctas_cow_pt_tbl
+using hudi
+tblproperties (type = 'cow', primaryKey = 'id', preCombineField = 'ts')
+partitioned by (dt)
+as
+select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-12-01' as dt;
+
+```
+
+Example CTAS command to load data from another table.
+
+```sql
+# create managed parquet table
+create table parquet_mngd using parquet location 
'file:///tmp/parquet_dataset/*.parquet';
+
+# CTAS by loading data into hudi table
+create table hudi_ctas_cow_pt_tbl2 using hudi location 
'file:/tmp/hudi/hudi_tbl/' options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (datestr) as select * from parquet_mngd;
+```
+
+**Create Table Properties**
+
+Users can set table properties while creating a hudi table. Critical options 
are listed here.
+
+| Parameter Name | Default | Introduction |
+|------------------|--------|------------|
+| primaryKey | uuid | The primary key names of the table, multiple fields 
separated by commas. Same as `hoodie.datasource.write.recordkey.field` |
+| preCombineField |  | The pre-combine field of the table. Same as 
`hoodie.datasource.write.precombine.field` |
+| type       | cow | The table type to create. type = 'cow' means a 
COPY-ON-WRITE table, while type = 'mor' means a MERGE-ON-READ table. Same as 
`hoodie.datasource.write.table.type` |
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .
+
+:::note
+1. Since hudi 0.10.0, `primaryKey` is required to specify. It can align with 
Hudi datasource writer’s and resolve many behavioural discrepancies reported in 
previous versions.
+2. `primaryKey`, `preCombineField`, `type` is case sensitive.
+3. To specify `primaryKey`, `preCombineField`, `type` or other hudi configs, 
`tblproperties` is a preferred way than `options`. Spark SQL syntax is detailed 
here.
+4. A new hudi table created by spark-sql will set 
`hoodie.table.keygenerator.class` as 
`org.apache.hudi.keygen.ComplexKeyGenerator`,

Review comment:
       ```suggestion
   4. A new hudi table created by Spark SQL will set 
`hoodie.table.keygenerator.class` as 
`org.apache.hudi.keygen.ComplexKeyGenerator`, and
   ```

##########
File path: website/docs/quick-start-guide.md
##########
@@ -175,18 +175,163 @@ values={[
 </TabItem>
 <TabItem value="sparksql">
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.
+  One needs to set **primaryKey** column in options to create a primary key 
table.
+  If you are using any of the built-in key generators in Hudi, likely it is a 
primary key table.
+
+Let's go over some of the create table commands.
+
+**Create a Non-Partitioned Table**
+
 ```sql
--- 
-create table if not exists hudi_table2(
-  id int, 
-  name string, 
+-- create a cow table, with default primaryKey 'uuid' and without 
preCombineField provided
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
   price double
+) using hudi;
+
+
+-- create a mor non-partitioned table without preCombineField provided
+create table hudi_mor_tbl (
+  id int,
+  name string,
+  price double,
+  ts bigint
 ) using hudi
-options (
-  type = 'cow'
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
 );
 ```
 
+Here is an example of creating an external COW partitioned table.
+
+**Create Partitioned Table**
+
+```sql
+-- create a partitioned, preCombineField-provided cow table
+create table hudi_cow_pt_tbl (
+  id bigint,
+  name string,
+  ts bigint,
+  dt string,
+  hh string
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (dt, hh)
+location '/tmp/hudi/hudi_cow_pt_tbl';
+```
+
+**Create Table for an existing Hudi Table**
+
+We can create a table on an existing hudi table(created with spark-shell or 
deltastreamer). This is useful to
+read/write to/from a pre-existing hudi table.
+
+```sql
+-- create an external hudi table based on an existing path
+
+-- for non-partitioned table
+create table hudi_existing_tbl0 using hudi
+location 'file:///tmp/hudi/dataframe_hudi_nonpt_table';
+
+-- for partitioned table
+create table hudi_existing_tbl1 using hudi
+partitioned by (dt, hh)
+location 'file:///tmp/hudi/dataframe_hudi_pt_table';
+```
+
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.
+:::
+
+**CTAS**
+
+Hudi supports CTAS(Create Table As Select) on spark sql. <br/>

Review comment:
       ```suggestion
   Hudi supports CTAS (Create Table As Select) on spark sql. <br/>
   ```

##########
File path: website/docs/quick-start-guide.md
##########
@@ -175,18 +175,163 @@ values={[
 </TabItem>
 <TabItem value="sparksql">
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.
+  One needs to set **primaryKey** column in options to create a primary key 
table.
+  If you are using any of the built-in key generators in Hudi, likely it is a 
primary key table.
+
+Let's go over some of the create table commands.
+
+**Create a Non-Partitioned Table**
+
 ```sql
--- 
-create table if not exists hudi_table2(
-  id int, 
-  name string, 
+-- create a cow table, with default primaryKey 'uuid' and without 
preCombineField provided
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
   price double
+) using hudi;
+
+
+-- create a mor non-partitioned table without preCombineField provided
+create table hudi_mor_tbl (
+  id int,
+  name string,
+  price double,
+  ts bigint
 ) using hudi
-options (
-  type = 'cow'
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
 );
 ```
 
+Here is an example of creating an external COW partitioned table.
+
+**Create Partitioned Table**
+
+```sql
+-- create a partitioned, preCombineField-provided cow table
+create table hudi_cow_pt_tbl (
+  id bigint,
+  name string,
+  ts bigint,
+  dt string,
+  hh string
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (dt, hh)
+location '/tmp/hudi/hudi_cow_pt_tbl';
+```
+
+**Create Table for an existing Hudi Table**
+
+We can create a table on an existing hudi table(created with spark-shell or 
deltastreamer). This is useful to
+read/write to/from a pre-existing hudi table.
+
+```sql
+-- create an external hudi table based on an existing path
+
+-- for non-partitioned table
+create table hudi_existing_tbl0 using hudi
+location 'file:///tmp/hudi/dataframe_hudi_nonpt_table';
+
+-- for partitioned table
+create table hudi_existing_tbl1 using hudi
+partitioned by (dt, hh)
+location 'file:///tmp/hudi/dataframe_hudi_pt_table';
+```
+
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.
+:::
+
+**CTAS**
+
+Hudi supports CTAS(Create Table As Select) on spark sql. <br/>
+Note: For better performance to load data to hudi table, CTAS uses the **bulk 
insert** as the write operation.
+
+Example CTAS command to create a non-partitioned COW table without 
preCombineField.
+
+```sql
+-- CTAS: create a non-partitioned cow table without preCombineField
+create table hudi_ctas_cow_nonpcf_tbl
+using hudi
+tblproperties (primaryKey = 'id')
+as
+select 1 as id, 'a1' as name, 10 as price;
+```
+
+Example CTAS command to create a partitioned, primary key COW table.
+
+```sql
+-- CTAS: create a partitioned, preCombineField-provided cow table
+create table hudi_ctas_cow_pt_tbl
+using hudi
+tblproperties (type = 'cow', primaryKey = 'id', preCombineField = 'ts')
+partitioned by (dt)
+as
+select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-12-01' as dt;
+
+```
+
+Example CTAS command to load data from another table.
+
+```sql
+# create managed parquet table
+create table parquet_mngd using parquet location 
'file:///tmp/parquet_dataset/*.parquet';
+
+# CTAS by loading data into hudi table
+create table hudi_ctas_cow_pt_tbl2 using hudi location 
'file:/tmp/hudi/hudi_tbl/' options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (datestr) as select * from parquet_mngd;
+```
+
+**Create Table Properties**
+
+Users can set table properties while creating a hudi table. Critical options 
are listed here.
+
+| Parameter Name | Default | Introduction |
+|------------------|--------|------------|
+| primaryKey | uuid | The primary key names of the table, multiple fields 
separated by commas. Same as `hoodie.datasource.write.recordkey.field` |
+| preCombineField |  | The pre-combine field of the table. Same as 
`hoodie.datasource.write.precombine.field` |
+| type       | cow | The table type to create. type = 'cow' means a 
COPY-ON-WRITE table, while type = 'mor' means a MERGE-ON-READ table. Same as 
`hoodie.datasource.write.table.type` |
+
+To set any custom hudi config(like index type, max parquet size, etc), see the 
 "Set hudi config section" .
+
+:::note
+1. Since hudi 0.10.0, `primaryKey` is required to specify. It can align with 
Hudi datasource writer’s and resolve many behavioural discrepancies reported in 
previous versions.
+2. `primaryKey`, `preCombineField`, `type` is case sensitive.
+3. To specify `primaryKey`, `preCombineField`, `type` or other hudi configs, 
`tblproperties` is a preferred way than `options`. Spark SQL syntax is detailed 
here.

Review comment:
       ```suggestion
   3. To specify `primaryKey`, `preCombineField`, `type` or other hudi configs, 
`tblproperties` is the preferred way than `options`. Spark SQL syntax is 
detailed here.
   ```

##########
File path: website/docs/quick-start-guide.md
##########
@@ -175,18 +175,163 @@ values={[
 </TabItem>
 <TabItem value="sparksql">
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.

Review comment:
       ```suggestion
     In general, Spark SQL supports two kinds of tables, namely managed and 
external.
   ```

##########
File path: website/docs/quick-start-guide.md
##########
@@ -175,18 +175,163 @@ values={[
 </TabItem>
 <TabItem value="sparksql">
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.

Review comment:
       Shall we mention here non-pk table not supported? and explain how `uuid` 
works as the implicit default pk




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql

Reply via email to