Re: Map side aggregations

2012-05-23 Thread Ranjith
Thanks philip. Thanks, Ranjith On May 23, 2012, at 4:15 AM, Philip Tromans wrote: > Hi Ranjith, > > I haven't checked the code (so this might not be true), but I think that the > map side aggregation stuff uses it's own hash map within the map phase to do > the aggregation, instead of using

Re: Local hadoop

2012-05-23 Thread Mohit Anchlia
Thanks! Do most of the people run hive on the datanodes? I was running hive on a non-hadoop node. On Wed, May 23, 2012 at 5:36 PM, Edward Capriolo wrote: > If your HADOOP_HOME specified the correct path your hadoop should > already be picking up this setting as well as many others. Be careful > h

Re: Local hadoop

2012-05-23 Thread Edward Capriolo
If your HADOOP_HOME specified the correct path your hadoop should already be picking up this setting as well as many others. Be careful here. If you have the default settings your dfs.replication would be 1 or you might get some other nasty surprises. On Wed, May 23, 2012 at 7:59 PM, Mohit Anchlia

Re: Local hadoop

2012-05-23 Thread Mohit Anchlia
Thanks. I used --hiveconf to set the jobtracker and it worked. On Wed, May 23, 2012 at 4:57 PM, Edward Capriolo wrote: > Hive will chose local mode when the inputs files are "small" as an > optimization. This also happens if mapred.job.tracker is set to local. > > On Wed, May 23, 2012 at 7:48 PM,

Re: Is there a way to create user account and grant read only permissions?

2012-05-23 Thread Patrick Luo
Thanks KS and others for thoughts and ideas. I found an ok alternative may benefit others in the same situation. The reason for users account is mainly for business users. HUE is the GUI interface we deployed for non-technical users. User need account to access HUE which is the gateway for HIVE

Re: Local hadoop

2012-05-23 Thread Edward Capriolo
Hive will chose local mode when the inputs files are "small" as an optimization. This also happens if mapred.job.tracker is set to local. On Wed, May 23, 2012 at 7:48 PM, Mohit Anchlia wrote: > When I launch simple SQL I see "local hadoop". And when I do hadoop job fs > -list in my hadoop cluster

Local hadoop

2012-05-23 Thread Mohit Anchlia
When I launch simple SQL I see "local hadoop". And when I do hadoop job fs -list in my hadoop cluster I don't see any jobs. Am I doing something wrong here? # hive Hive history file=/tmp/root/hive_job_log_root_201205231 Execution log at: /tmp/root/root_20120523163636_18c8cce4-7568-401f-b502-223

Re: Want to give a short talk at the next Hive User Group meetup?

2012-05-23 Thread Carl Steinbach
Hi Ed, Sounds good. Please send me a copy of your slides and I'll find someone to present them (or do it myself). Thanks. Carl On Wed, May 23, 2012 at 7:17 AM, Edward Capriolo wrote: > I can give you a PPT on the upcoming "programming hive" book that > someone else can run thought. but I wont

Re: How to Write a Simple SerDe?

2012-05-23 Thread Rubin, Bradley S.
I ended up slogging through this, and got it working. Here is the code for the custom writable and the corresponding custom SerDe, in case it helps others trying to do the same thing: http://pastebin.com/xUy36Kxg . It dropped the average bytes/record from 30.5 (with a CSV text string) to 18.2.

Re: Want to give a short talk at the next Hive User Group meetup?

2012-05-23 Thread Edward Capriolo
I can give you a PPT on the upcoming "programming hive" book that someone else can run thought. but I wont be able to present unless you want to fly me out :) Edward On Tue, May 22, 2012 at 9:28 PM, Carl Steinbach wrote: > Hi, > > I just wanted to remind everyone that the next Hive User Group me

Re:

2012-05-23 Thread alo alt
For a test I would suggest, yes. The issue isn't a CPU issue, depend only on memory. -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF On May 23, 2012, at 11:58 AM, Debarshi Basak wrote: > I have 16 cores on each machines? > Should i still

Re:

2012-05-23 Thread Debarshi Basak
I have 16 cores on each machines? Should i still set mappers to 1?Debarshi BasakTata Consultancy ServicesMailto: debarshi.ba...@tcs.comWebsite: http://www.tcs.comExperience certainty. IT ServicesBusiness SolutionsOutsourcing___

Re:

2012-05-23 Thread alo alt
Ah, 24 mappers are really high. Did you tried to use only one mapper? -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF On May 23, 2012, at 11:50 AM, Debarshi Basak wrote: > Actually yes..I changed java opts is 2g..mapred.child.opts is 400m

Re:

2012-05-23 Thread Debarshi Basak
Actually yes..I changed java opts is 2g..mapred.child.opts is 400m  i have max mapper set to 24...My memory is 64GB..My problem is that the size of index created is around 22GB..How does the index in hive works?Does it load the complete index into memory?Debarshi BasakTata Consultancy ServicesMailt

Re:

2012-05-23 Thread alo alt
Use the memory management options, as described in the link above. You was gotten OOM - out of memory - and that could depend on a misconfiguration. Did you try playing with mapred.child.ulimit and with java.opts? -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Gr

Re: Map side aggregations

2012-05-23 Thread Philip Tromans
Hi Ranjith, I haven't checked the code (so this might not be true), but I think that the map side aggregation stuff uses it's own hash map within the map phase to do the aggregation, instead of using a combiner, so you wouldn't expect to see any combine input records. Have a look for parameters li

Re:

2012-05-23 Thread Debarshi Basak
But what i am doing is i am creating index then setting the path of the index and running a select  from table_name where How can i resolve this issue?Debarshi BasakTata Consultancy ServicesMailto: debarshi.ba...@tcs.comWebsite: http://www.tcs.comExperie

Re:

2012-05-23 Thread alo alt
Hi, http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Memory+management This message means that for some reason the garbage collector is taking an excessive amount of time (by default 98% of all CPU time of the process) and recovers very little memory in each run (by default 2%

[no subject]

2012-05-23 Thread Debarshi Basak
When i am trying to run a query with index i am getting this exception.My hive version is 0.7.1   java.lang.OutOfMemoryError: GC overhead limit exceeded    at java.nio.ByteBuffer.wrap(ByteBuffer.java:369)    at org.apache.hadoop.io.Text.decode(Text.java:327)    at org.apache.hadoop.io.T