It seems that column `link_crtd_date` is of type `timestamp` and you therefore 
partition by date including time, which produces a huge amount of directories. 
I assume your intent is to partition by date (partition_date=yyyy-MM-dd or 
year=yyyy/month=MM/day=dd) so you need to format/split your timestamp 
accordingly, for example:

-- partitioned by 'yyyy-MM-dd'
INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2
PARTITION (partition_date)
SELECT
    date_format(link_crtd_date, 'yyyy-MM-dd') as partition_date,
    *
FROM bsl12.email_edge_lyh_mth1;

-- partitioned by year/month/day
INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2
PARTITION (year, month, day)
SELECT
    year(link_crtd_date, 'yyyy-MM-dd') as year,
    month(link_crtd_date, 'yyyy-MM-dd') as month,
    day(link_crtd_date, 'yyyy-MM-dd') as day,
    *
FROM bsl12.email_edge_lyh_mth1;
Best Regards

Roland Johann
Software Developer/Data Engineer

phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany

Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io
Web: phenetic.io

Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann



> Am 23.08.2019 um 09:43 schrieb zhangliyun <kelly...@126.com>:
> 
> Hi all:
>   when i use spark dynamic partition feature , i met a problem about hdfs 
> quota.  I found that it is every easy to meet quota problem (exceed the max 
> value of quota of directory)
> 
> I have generated a unpartitioned table 'bsl12.email_edge_lyh_mth1' which 
> contains 584M records and will insert it to a  partitioned table 
> "bsl12.email_edge_lyh_partitioned2"
> --select count(*) from bsl12.email_edge_lyh_mth1; --584652128
> --INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2 PARTITION 
> (link_crtd_date) SELECT * FROM bsl12.email_edge_lyh_mth1;
> 
> 
> when i viewed the temporary directory when sql running, i saw  multiple  file 
> with link_crd_date=2018-01-01***, I guess one record one temporary file.  as  
> there are 584M data in the unpartitioned table,  is there any parameters for 
> us to control the temporary file count  to avoid the quota problem.
> 
> ```
>  
> 133    
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-0112%3A35%3A29
> 137    
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
>  12%3A35%3A47
> 136    
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
>  12%3A38%3A23
> 132    
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
>  12%3A38%3A54
> 536    
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
>  12%3A40%3A01
> 
> ```
> 
> Best Regards
> 
> Kelly Zhang
> 
> 
>  

Reply via email to