Hi 

when i use Hive dynamic partition feature , I found that it is very easy to 
meet exceed max created files count exception ( i have set the 
hive.exec.max.created.files as 100K but still fail)




I have generated a unpartitioned table 'bsl12.email_edge_lyh_mth1' which 
contains 584M records and will insert it to a  partitioned table 
"bsl12.email_edge_lyh_partitioned2"


 set hive.exec.dynamic.partition=true;
 set hive.exec.max.dynamic.partitions=500;
 SET hive.exec.max.dynamic.partitions.pernode=500;
 set hive.exec.dynamic.partition.mode=nonstrict;
 SET hive.exec.max.created.files=10000;


--select count(*) from bsl12.email_edge_lyh_mth1; --584652128
INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2 PARTITION 
(link_crtd_date) SELECT * FROM bsl12.email_edge_lyh_mth1;




I guess that  when dynamic partition insert , it will first calculate the 
partitions the new table contains,  meanwhile generate temporary files ,  in 
the final step , it will move temporary files to 
the specified partition location.   Here my problem is generating too much 
temporary files so cause exceed max created files count exception.  So what 
principal Hive use for generating temporary files?  will generate
a temporary file for every record ? so it is will generate temporary files 
which number is equals to the number of  unpartitioned table.  Can you give me 
some suggestion about it?  I have tried both Hive on Tez and Hive on MapReduce, 
both fail.








Kelly Zhang



Reply via email to