Re: Opposite of explode?

2011-02-10 Thread Sonal Goyal
Is collect_set what you are looking for? I havent used it myself, but it seems to remove the duplicates.. http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Built-in_Aggregate_Functions_.28UDAF.29 Thanks and Regards, Sonal Connect Hadoop with databases, Sale

Opposite of explode?

2011-02-10 Thread Tim Robertson
Hi all, Sorry if I am missing something obvious but is there an inverse of an explode? E.g. given t1 ID Name 1 Tim 2 Tim 3 Tom 4 Frank 5 Tim Can you create t2: Name ID Tim1,2,5 Tom 3 Frank 4 In Oracle it would be a select name,collect(id) from t1 group by name I suspect in Hive

Re: hive : question about reducers

2011-02-10 Thread Ajo Fod
Does the execution plan for the query give any hints? In my case I noticed data was picked from the buckets using a hashkey. Subsequently the workload was split between reducers using the same hashkey. So, only one reducer ended up doing all the work. I don't know how to fix this problem yet ... b

Re: hive : question about reducers

2011-02-10 Thread Viral Bajaria
there were 3 different queries which exhibited this behavior ... one was over 30-days worth of data and 2 were over 7-days worth of data. On Thu, Feb 10, 2011 at 3:49 PM, Jonathan Coveney wrote: > How many days of data are you working on? > > > Sent via BlackBerry > --

Re: hive : question about reducers

2011-02-10 Thread Jonathan Coveney
How many days of data are you working on? Sent via BlackBerry -Original Message- From: Viral Bajaria Date: Thu, 10 Feb 2011 15:21:32 To: Reply-To: user@hive.apache.org Subject: Re: hive : question about reducers I don't have any explicit bucketing in my data. The data is partitioned b

Re: hive : question about reducers

2011-02-10 Thread Viral Bajaria
I don't have any explicit bucketing in my data. The data is partitioned by current_date (it has no hour information, so basically 24 hours of data). It's not a problem because eventually the job would complete (super-slow) but it would be nice to know the reason behind this behavior and how I coul

Re: hive : question about reducers

2011-02-10 Thread Ajo Fod
I've had similar experiences ... usually with bucketing. Is this your experience too? -Ajo On Thu, Feb 10, 2011 at 1:57 PM, Viral Bajaria wrote: > Hello, > > In my Hive cluster, I have setup the mapred.reduce.tasks to be -1 i.e. I am > allowing HIVE to figure out the # of reducers that it would

Determining New/Repeat Visitor

2011-02-10 Thread Wil -
Hi, Is there a good way to determine repeat visitor in analyzing web logs using Hive/Hadoop? One idea that I can come up with is storing the list of user id and session id (session data) in another table and then join that table. However, the session data table would grow indefinitely (potent

hive : question about reducers

2011-02-10 Thread Viral Bajaria
Hello, In my Hive cluster, I have setup the mapred.reduce.tasks to be -1 i.e. I am allowing HIVE to figure out the # of reducers that it would need from the data. When I run a query, it determines that it will need 4 reducers but when I look at the MAPRED logs, I see that all the work is done by

Re: udf function to calculate median

2011-02-10 Thread Jerome Boulon
Yes, the percentile function will give you the median information. /Jerome. On 2/10/11 10:16 AM, "Anurag Phadke" wrote: >Does Hive have any UDF function to calculate median for a given column? > >-anurag >

udf function to calculate median

2011-02-10 Thread Anurag Phadke
Does Hive have any UDF function to calculate median for a given column? -anurag

Re: what char represents NULL value in hive?

2011-02-10 Thread 김영우
Hi LiuLei, You should use '\N' for null in your data files. - Youngwoo 2011/1/21 lei liu > I generate HDFS file , then I load the file to one hive table. There are > some colums are don't have value, I need to set these colums to NULL. I > want to know what char represents NULL value in hive

Re: what char represents NULL value in hive?

2011-02-10 Thread Bennie Schut
At least on trunk it seems on external tables(perhaps also TextFile?) this works for integer values but not for string values. For a string it will then return as an empty string which you then have to find with " where field = '' " but I would prefer to use " where field is null ". Not sure if

dynamic partition

2011-02-10 Thread Cam Bazz
so basically 2011021002 would be shortened to 20110210 and so on. I am falling apart in two places: a. i can not query like where partition_name = 2011021002*, I would have to do like partition_name = '2011021002' or partion_name = '2011021003' which would require me knowing the part

Re: query returns sometext instead of none

2011-02-10 Thread Ajo Fod
Have you tried constructing the table as a text file? use the following at the end of the "CREATE table" statement : ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; It might be just that sequencefile puts in some information even if there is no data. Cheers, Ajo. On Wed, Feb

RE: periodic execution

2011-02-10 Thread Balaji Rajagopalan
I totally agree on the part where jars not available on the public maven repository was a big hurdle to get this working, I added the dependencies to the pom.xml and manually installed all the depend packages , which is not what I really wanted to do, but it works. From: Alejandro Abdelnur [ma

Re: [Oozie-users] Re: periodic execution

2011-02-10 Thread Carl Steinbach
Hi Balaji, Just wanted to add that you should put your Hive configuration settings in hive-site.xml. hive-default.xml is intended for default property values and should not be modified by anyone other than Hive developers. Additionally, it's likely that hive-default.xml will be deprecated in a fut

Re: periodic execution

2011-02-10 Thread Alejandro Abdelnur
Hi Balaji, The latest patch of the Hive action does not bundle hive-default.xml (got same feedback from Carl), you'll be responsible for bundling it in the WF directory until Hive JARs bundles it. I'll upload the new patch early next week and then ask Oozie it integrate it. Still the problem I h

RE: periodic execution

2011-02-10 Thread Balaji Rajagopalan
Alejandro, I have used your hive action patch from tucu's forked branch in yahoo github and it works fine, when will your patch be available in the master branch of yahoo github. Also I have a small suggestion if I may, hive-default.xml is bundled with the oozie-core.jar, instead can we hav