Wenning Ding created HUDI-353:
---------------------------------
Summary: Add support for Hive style partitioning path
Key: HUDI-353
URL: https://issues.apache.org/jira/browse/HUDI-353
Project: Apache Hudi (incubating)
Issue Type: Improvement
Reporter: Wenning Ding
In Hive, the partition folder name follows this format:
<partition_column_name>=<partition_value>.
But in Hudi, the name of its partition folder is <partition_value>.
e.g. A dataset is partitioned by three columns: year, month and day.
In Hive, the data is saved in:
{{.../<table_name>/year=2019/month=05/day=01/xxx.parquet}}
In Hudi, the data is saved in: {{.../<table_name>/2019/05/01/xxx.parquet}}
Basically I add a new option in Spark datasource named
{{HIVE_STYLE_PARTITIONING_FILED_OPT_KEY}} which indicates whether using hive
style partitioning or not. By default this option is false (not use).
Also, if using hive style partitioning, instead of scanning the dataset and
manually adding/updating all partitions, we can use "MSCK REPAIR TABLE
<table_name>" to automatically sync all the partition info with Hive MetaStore.
h3.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)