Thank you all for the tips.I'll dig into all these and let you people know :)
________________________________ From: Igor Tatarinov <i...@decide.com> To: user@hive.apache.org Sent: Tue, 8 March, 2011 11:47:20 AM Subject: Re: Hive too slow? Most likely, Hadoop's memory settings are too high and Linux starts swapping. You should be able to detect that too using vmstat. Just a guess. On Mon, Mar 7, 2011 at 10:11 PM, Ajo Fod <ajo....@gmail.com> wrote: hmm I don't know of such a place ... but if I had to debug, I'd try to understand the following: >1) are the underlying files zipped/compressed ... that ususally makes it slower. >2) are the files located on the hard drive or hdfs? >3) are all the cores being used? ... check number of reduce and map tasks. > >-Ajo > > > >On Mon, Mar 7, 2011 at 9:24 PM, abhishek pathak ><forever_yours_a...@yahoo.co.in> >wrote: > >I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that >map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4 >minutes is indeeed not normal behaviour.Could you point me at some places >where >i can get some info on how to tune this up? >> >> >>Regards, >>Abhishek >> >> >> ________________________________ From: Ajo Fod <ajo....@gmail.com> >>To: user@hive.apache.org >>Sent: Mon, 7 March, 2011 9:21:51 PM >>Subject: Re: Hive too slow? >> >> >>In my experience, hive is not instantaneous like other DBs, but 4 minutes to >>count 2200 rows seems unreasonable. >> >>For comparison my query of 169k rows one one computer with 4 cores running >>1Ghz >>(approx) took 20 seconds. >> >>Cheers, >>Ajo. >> >> >>On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak >><forever_yours_a...@yahoo.co.in> >>wrote: >> >>Hi, >>> >>> >>>I am a hive newbie.I just finished setting up hive on a cluster of two >>>servers >>>for my organisation.As a test drill, we operated some simple queries.It took >>>the >>>standard map-reduce algorithm around 4 minutes just to execute this query: >>> >>> >>>count(1) from tablename; >>> >>> >>>The answer returned was around 2200.Clearly, this is not a big number by >>>hadoop >>>standards.My question is whether this is a standard performance or is there >>>some >>>configuration that is not optimised?Will scaling up of data to say,50 times, >>>produce any drastic slowness?I tried reading the documentation but was not >>>clear >>>on these issues, and i would like to have an idea before this setup starts >>>working in a production environment. >>> >>> >>>Thanks in advance, >>>Regards, >>>Abhishek Pathak >>> >>> >>> >>> >>> >> >> >