Re: Hive query taking too much time

2011-12-08 Thread Wojciech Langiewicz
e- From: Wojciech Langiewicz [mailto:wlangiew...@gmail.com] Sent: Wednesday, December 07, 2011 8:15 PM To: user@hive.apache.org Subject: Re: Hive query taking too much time Hi, In this case it's much easier and faster to merge all files using this command: cat *.csv> output.csv hive

Re: Hive query taking too much time

2011-12-08 Thread Aniket Mokashi
ame bytes. What do you suggest? > > Kind Regards, > Keshav C Savant > > > -Original Message- > From: Wojciech Langiewicz [mailto:wlangiew...@gmail.com] > Sent: Wednesday, December 07, 2011 8:15 PM > To: user@hive.apache.org > Subject: Re: Hive query taking too much ti

RE: Hive query taking too much time

2011-12-07 Thread Savant, Keshav
taking too much time Hi, In this case it's much easier and faster to merge all files using this command: cat *.csv > output.csv hive -e "load data local inpath 'output.csv' into table $table" On 07.12.2011 07:00, Vikas Srivastava wrote: > hey if u having the same col

Re: Hive query taking too much time

2011-12-07 Thread Wojciech Langiewicz
Hi, In this case it's much easier and faster to merge all files using this command: cat *.csv > output.csv hive -e "load data local inpath 'output.csv' into table $table" On 07.12.2011 07:00, Vikas Srivastava wrote: hey if u having the same col of all the files then you can easily merge by s

RE: Hive query taking too much time

2011-12-07 Thread Savant, Keshav
14,271,688 Thanks a lot for your help. Kind Regards, Keshav C Savant From: Paul Mackles [mailto:pmack...@adobe.com] Sent: Tuesday, December 06, 2011 8:14 PM To: user@hive.apache.org Subject: RE: Hive query taking too much time How much time is it spending in the map/reduce phases

Re: Hive query taking too much time

2011-12-06 Thread Ayon Sinha
t my Blog for answers to commonly asked questions. From: Vikas Srivastava To: user@hive.apache.org Sent: Tuesday, December 6, 2011 10:00 PM Subject: Re: Hive query taking too much time hey if u having the same col of  all the files then you can easily merg

Re: Hive query taking too much time

2011-12-06 Thread Vikas Srivastava
hey if u having the same col of all the files then you can easily merge by shell script list=`*.csv` $table=yourtable for file in $list do cat $file >>new_file.csv done hive -e "load data local inpath '$file' into table $table" it will merge all the files in single file then you can upload it in

Re: Hive query taking too much time

2011-12-06 Thread Mohit Gupta
Hi Paul, I am having the same problem. Do you know any efficient way of merging the files? -Mohit On Tue, Dec 6, 2011 at 8:14 PM, Paul Mackles wrote: > How much time is it spending in the map/reduce phases, respectively? The > large number of files could be creating a lot of mappers which creat

RE: Hive query taking too much time

2011-12-06 Thread Paul Mackles
How much time is it spending in the map/reduce phases, respectively? The large number of files could be creating a lot of mappers which create a lot of overhead. What happens if you merge the 2624 files into a smaller number like 24 or 48. That should speed up the mapper phase significantly. Fr

Re: Hive query taking too much time

2011-12-06 Thread Wojciech Langiewicz
Hi, In your case total file size isn't main factor that reduces performance, number of files is. To test this try merging those over 2000 files into one (or few) big, then upload it to HDFS and test hive performance (it should be definitely higher). It this works you should think about mergin