Are you using bucketing? If so those are empty ORC files without any data 
containing only metadata information. 


_____________________________
From: Juraj jiv <fatcap....@gmail.com>
Sent: Tuesday, August 18, 2015 8:28 AM
Subject: Hive 12 - CDH 5.0.1 - many small files when using ORC table
To:  <user@hive.apache.org>



 
  
 
 
  
   
    
     Hello all,
     

     
i have question about ORC table format. We use it as for our datastore tables 
but during maintenance i noticed there is many small files inside tables which 
I presume doesn't contains any data. They are only 43bytes in size and they 
takes around 70% of all files inside table folder.
     

     

    
    
     For example (grep 43 bytes is size and other):
     

     

    
    
     hadoop@hadoopnn:~$ hdfs dfs -du -h 
/user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 | grep 
"^43 " | wc -l
     
7448
     
hadoop@hadoopnn:~$ hdfs dfs -du -h 
/user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 | grep -v 
"^43 " | wc -l
     
4712
     

     

    
    
     Why is that? Why is there those many 43bytes files? 
     

     

    
    
     Ascii content of the files is, which i guess is just ORC header:
     

    
    
     0@▒▒▒"
     
      ▒▒ORC
     

    
    
     
hive version:
     
0.12.0+cdh5.0.1+315     1.cdh5.0.1.p0.31     CDH 5
     

     

    Thanks
    

   JV

Reply via email to