Re: OutOfMemory errors on joining 2 large tables.

2011-02-22 Thread Mapred Learn
Oops I meant nulls. Sent from my iPhone On Feb 22, 2011, at 8:22 PM, Mapred Learn wrote: > Check if you can filter non-nulls. That might help. > > Sent from my iPhone > > On Feb 22, 2011, at 12:46 AM, Bennie Schut wrote: > >> I've just set the "hive.exec.reducers.bytes.per.reducer" to as lo

Re: OutOfMemory errors on joining 2 large tables.

2011-02-22 Thread Mapred Learn
Check if you can filter non-nulls. That might help. Sent from my iPhone On Feb 22, 2011, at 12:46 AM, Bennie Schut wrote: > I've just set the "hive.exec.reducers.bytes.per.reducer" to as low as 100k > which caused this job to run with 999 reducers. I still have 5 tasks failing > with an outof

RE: OutOfMemory errors on joining 2 large tables.

2011-02-22 Thread Paul Yang
Have you taken a look at the distribution of your join keys? If there are a couple join keys that occur much more frequently than others, the reducers handling those keys will have more load and may be subject to an OOM. -Original Message- From: Bennie Schut [mailto:bsc...@ebuddy.com] S

Re: calculating unique views based on ip, session_id

2011-02-22 Thread wd
yes, ip_number and session_id should not be in group by clause. 2011/2/22 Viral Bajaria > I am guessing the following query should work too: > > select item_sid, count(distinct ip_number, session_id) from item_raw where > date_day = '20110202' group by item_sid; > > On Mon, Feb 21, 2011 at 9:42

Re: implementing moving average as a UDF

2011-02-22 Thread John Sichi
Yes, your query makes sense and should already work as expected. The idea of HIVE-1994 is that once the new annotation is available, we'll make a guarantee that your query as written below will continue to work in the face of any new optimizer changes (with the downside being that in some cases

Re: implementing moving average as a UDF

2011-02-22 Thread Igor Tatarinov
Thank you, John. It's not quite clear from the page whether my solution: 1. makes sense 2. works now 3. will work in the future if the issue is resolved/implemented Could you elaborate? Also, there is no mentioning of UDF object sharing (between mappers) in the current implementation. Is this a

Re: implementing moving average as a UDF

2011-02-22 Thread John Sichi
Please see the discussion in this JIRA issue: https://issues.apache.org/jira/browse/HIVE-1994 JVS On Feb 21, 2011, at 10:45 PM, Igor Tatarinov wrote: > I would like to implement the moving average as a UDF (instead of a streaming > reducer). Here is what I am thinking. Please let me know if I

Re: Extract Create Table statement from Hive

2011-02-22 Thread Jay Ramadorai
Thank you, Ed. Works like a charm after I remove the Hive2rdbms references. I've uploaded the jar to the JIRA for those who want to use it. On Feb 22, 2011, at 1:13 PM, Edward Capriolo wrote: > On Tue, Feb 22, 2011 at 1:09 PM, Jay Ramadorai > wrote: >> Thank you, Ed. Trying it now, but I n

Re: Extract Create Table statement from Hive

2011-02-22 Thread Edward Capriolo
On Tue, Feb 22, 2011 at 1:09 PM, Jay Ramadorai wrote: > Thank you, Ed. Trying it now, but I need the following package to build > HiveUtil: > > com.media6.hive2rdbms.common.Hive2RdbmsConf; > > can you point me to where I can get it from? > > On Feb 22, 2011, at 10:51 AM, Edward Capriolo wrote: > >

Re: Extract Create Table statement from Hive

2011-02-22 Thread Jay Ramadorai
Thank you, Ed. Trying it now, but I need the following package to build HiveUtil: com.media6.hive2rdbms.common.Hive2RdbmsConf; can you point me to where I can get it from? On Feb 22, 2011, at 10:51 AM, Edward Capriolo wrote: > On Mon, Feb 21, 2011 at 7:31 PM, Edward Capriolo > wrote: >> On Mo

RE: TOAD for hive

2011-02-22 Thread Guy Doulberg
Thanks, That did the trick -Original Message- From: Peter Hall [mailto:peter.h...@quest.com] Sent: Monday, February 21, 2011 11:45 PM To: user@hive.apache.org Subject: RE: TOAD for hive The way to add jars has changed. In your hive-site.xml add something like: hive.aux.jars.path fil

Re: Extract Create Table statement from Hive

2011-02-22 Thread Edward Capriolo
On Mon, Feb 21, 2011 at 7:31 PM, Edward Capriolo wrote: > On Mon, Feb 21, 2011 at 6:42 PM, Jay Ramadorai > wrote: >> Does anyone have a way of generating the create table statement for a table >> that is in Hive?  I see a jira for >> this https://issues.apache.org/jira/browse/HIVE-967 and it appe

Re: OutOfMemory errors on joining 2 large tables.

2011-02-22 Thread Bennie Schut
I've just set the "hive.exec.reducers.bytes.per.reducer" to as low as 100k which caused this job to run with 999 reducers. I still have 5 tasks failing with an outofmemory. We have jvm reuse set to 8 but dropping it to 1 seems to greatly reduce this problem: set mapred.job.reuse.jvm.num.tasks

Re: calculating unique views based on ip, session_id

2011-02-22 Thread Viral Bajaria
I am guessing the following query should work too: select item_sid, count(distinct ip_number, session_id) from item_raw where date_day = '20110202' group by item_sid; On Mon, Feb 21, 2011 at 9:42 PM, Cam Bazz wrote: > The query you have produced mulltiple item_sid's. > > This is rather what I h