I run a single query like select retailer_key,count(*) from records group by retailer_key;
it uses a single map as shown below, since the file is already on HDFS, so I think hadoop/hive doesn't need to copy anything. Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed Task Attempts map100.00% 100100 / 0 reduce100.00% 100100 / 0 but the final chart in the job report shows "copy" takes about 33% of the total time, and the rest are "sort", and "reduce". So why it should copy here, or copy means something elso? oracle@oracle-MS-7623:~/test$ hadoop fs -lsr / drwxr-xr-x - oracle supergroup 0 2011-08-10 19:46 /user drwxr-xr-x - oracle supergroup 0 2011-08-10 19:46 /user/hive drwxr-xr-x - oracle supergroup 0 2011-08-10 19:59 /user/hive/warehouse drwxr-xr-x - oracle supergroup 0 2011-08-10 19:59 /user/hive/warehouse/records -rw-r--r-- 1 oracle supergroup 41600256 2011-08-10 19:59 /user/hive/warehouse/records/test.txt