why need to copy when run a sql with a single map

Daniel,Wu Wed, 10 Aug 2011 05:08:30 -0700

I run a single query like

select retailer_key,count(*) from records group by retailer_key;


it uses a single map as shown below, since the file is already on HDFS, so I 
think hadoop/hive doesn't need to copy anything.


Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map100.00%


100100 / 0
reduce100.00%


100100 / 0

but the final chart in the job  report shows "copy" takes about 33% of the 
total time, and the rest are "sort", and "reduce".  So why it should copy here, 
or copy means something elso?
 oracle@oracle-MS-7623:~/test$ hadoop fs -lsr /

drwxr-xr-x   - oracle supergroup          0 2011-08-10 19:46 /user
drwxr-xr-x   - oracle supergroup          0 2011-08-10 19:46 /user/hive
drwxr-xr-x   - oracle supergroup          0 2011-08-10 19:59 
/user/hive/warehouse
drwxr-xr-x   - oracle supergroup          0 2011-08-10 19:59 
/user/hive/warehouse/records
-rw-r--r--   1 oracle supergroup   41600256 2011-08-10 19:59 
/user/hive/warehouse/records/test.txt

why need to copy when run a sql with a single map

Reply via email to