[jira] [Created] (HUDI-353) Add support for Hive style partitioning path

Wenning Ding (Jira) Wed, 20 Nov 2019 11:10:25 -0800

Wenning Ding created HUDI-353:
---------------------------------

             Summary: Add support for Hive style partitioning path
                 Key: HUDI-353
                 URL: https://issues.apache.org/jira/browse/HUDI-353
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
            Reporter: Wenning Ding



In Hive, the partition folder name follows this format: 
<partition_column_name>=<partition_value>.
But in Hudi, the name of its partition folder is <partition_value>.

e.g. A dataset is partitioned by three columns: year, month and day.
In Hive, the data is saved in: 
{{.../<table_name>/year=2019/month=05/day=01/xxx.parquet}}
In Hudi, the data is saved in: {{.../<table_name>/2019/05/01/xxx.parquet}}

Basically I add a new option in Spark datasource named 
{{HIVE_STYLE_PARTITIONING_FILED_OPT_KEY}} which indicates whether using hive 
style partitioning or not. By default this option is false (not use).

Also, if using hive style partitioning, instead of scanning the dataset and 
manually adding/updating all partitions, we can use "MSCK REPAIR TABLE 
<table_name>" to automatically sync all the partition info with Hive MetaStore.
h3.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-353) Add support for Hive style partitioning path

Reply via email to