Hi Wei
     If you look at your query, it is a multi table insert, and it is treated 
as a single operation in hive. In multi table insert it is just one time scan 
of the table being done instead of scanning the same table again and again( 6 
times in your case) . From the scanned data, the various filters are applied 
altogether with the help of map reduce jobs. (you have 6 filters and hence 6 MR 
jobs). This is step 1 and then step 2 is coping the output of all these map 
reduce jobs from hdfs to lfs.
 
It would have worked the sequential way if it was not multi table inserts.

Regards
Bejoy.K.S


________________________________
 From: "Lu, Wei" <[email protected]>
To: "[email protected]" <[email protected]>; Bejoy Ks 
<[email protected]> 
Sent: Wednesday, March 7, 2012 10:31 PM
Subject: re: Why Move Operations after MapReduce are in sequential?
 

 
Hi Bejoy.K.S,

  Yes, there are two steps and as for my query, there will be 6 steps with one 
mapreduce and 5 move operations. My question is why the 5 move operations are 
executed sequentially rather than in parallel affter step 1?

Regards,
Wei
 

________________________________
 
发件人: Bejoy Ks [[email protected]]
发送时间: 2012年3月7日 7:36
到: [email protected]
主题: Re: Why Move Operations after MapReduce are in sequential?


Hi Wei
     Here there are two operations that takes place for your query
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis1' select * where 
impressionid<'1239572996000' 

1 - A map reduce job that performs the operation select * where 
impressionid<'1239572996000
2 -  A file system operation that copies the output of Step 1 from hdfs to lfs 
(hadoop fs -copyToLocal). Step 2 would be executed only after completion of 
Step 1.


Regards
Bejoy.K.S


________________________________
 From: "Lu, Wei" <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Wednesday, March 7, 2012 5:12 PM
Subject: Why Move Operations after MapReduce are in sequential?


 
Hi, 
 
For the query below, I find the five Move Operations (after MapReduce job) are 
not operated in parallel.
 
from impressions2 
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis1' select * where 
impressionid<'1239572996000'
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis2' select * where 
impressionid<'1239592780000' AND impressionid>='1239572996000'
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis3' select * where 
impressionid<'1239648597000' AND impressionid>='1239592780000'
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis4' select * where 
impressionid<'1239714028000' AND impressionid>='1239648597000'
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis5' select * where 
impressionid>='1239714028000';
 
------
Ended Job = job_201203060735_0008
Copying data to local directory /disk2/iis1
Copying data to local directory /disk2/iis1
Copying data to local directory /disk2/iis2
Copying data to local directory /disk2/iis2
Copying data to local directory /disk2/iis3
Copying data to local directory /disk2/iis3
Copying data to local directory /disk2/iis4
Copying data to local directory /disk2/iis4
Copying data to local directory /disk2/iis5
Copying data to local directory /disk2/iis5
------
 
 
I thought the Move Operations could be done in parallel, and the performance 
will be improved is the MapReduce temp result is pretty large.
 
 
Regards,
Wei

Reply via email to