Hi, I am using java reducer reading from a table, and then write to another one:
FROM ( FROM ( SELECT column1,... FROM table1 WHERE ( partition > 6 and partition < 12 ) ) A MAP A.column1,A.... USING 'java -cp .my.jar mymapper.mymapper' AS key, value CLUSTER BY key ) map_output INSERT OVERWRITE TABLE target_table PARTITION(partition) REDUCE map_output.key, map_output.value USING 'java -cp .:myjar.jar myreducer.myreducer' AS column1,column2;" Its all working fine, except that there are many (20-30) small files generated under each partition. i am setting SET hive.exec.reducers.bytes.per.reducer=1280,000,000; hoping to get one big enough file under for each partition.But it does not seem to have any effect. I still get 20-30 small files under each folder, and each file size is around 7kb. How can I force to generate only 1 big file for one partition? Does this have anything to do with the streaming? I recall in the past i was directly reading from a table with UDF, and write to another table, it only generates one big file for the target partition. Not sure why is that. Any help appreciated. Thanks, Chen