Hello all, i have question about ORC table format. We use it as for our datastore tables but during maintenance i noticed there is many small files inside tables which I presume doesn't contains any data. They are only 43bytes in size and they takes around 70% of all files inside table folder.
For example (grep 43 bytes is size and other): hadoop@hadoopnn:~$ hdfs dfs -du -h /user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 | grep "^43 " | wc -l 7448 hadoop@hadoopnn:~$ hdfs dfs -du -h /user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 | grep -v "^43 " | wc -l 4712 Why is that? Why is there those many 43bytes files? Ascii content of the files is, which i guess is just ORC header: 0@▒▒▒" ▒▒ORC hive version: 0.12.0+cdh5.0.1+315 1.cdh5.0.1.p0.31 CDH 5 Thanks JV