Re: Anything wrong with this query?

2012-07-10 Thread Vijay
When you say LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts The last alias is an alias to a column; not a table. explode will only produce one column per row; it cannot produce multiple columns. I hope that helps. On Tue, Jul 10, 2012 at 8:31 PM, Raihan Jamal wrote: > I have t

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Raihan Jamal
Yup this works. Thanks for the help. *Raihan Jamal* On Tue, Jul 10, 2012 at 4:37 PM, Vijay wrote: > In that case, wouldn't this work: > > SELECT buyer_id, item_id, rank(buyer_id), created_time > FROM ( > SELECT buyer_id, item_id, created_time > FROM testingtable1 > DISTRIBUTE BY

Anything wrong with this query?

2012-07-10 Thread Raihan Jamal
I have this below query, whenever I try to execute it, I always get error, which I am not able to understand. SELECT t1.buyer_id, t1.item_id, t1.created_time, t2.product_id, t2.timestamps FROM TestingTable1 t1 JOIN ( SELECT user_id, prod_and_ts.product_id as

How to Decide How Many Buckets to Use?

2012-07-10 Thread Xun TANG
Hi, We use Hive to store user click logs and dig out interesting trends, doing a lot of big table joins. We mostly partition by date, and inside each partition data is bucketed by some hashed-id. We are wondering how many buckets to use? If I understand correctly, Hive DB stores each bucket in on

FileSystem Closed

2012-07-10 Thread 안의건
Hello. I have a problem with the filesystem closing. The filesystem was closed when the hive query is running. It is 'select' query and the data size is about 1TB. I'm using hadoop-0.20.2 and hive-0.7.1. The error log is telling that tmp file is not deleted, or the tmp path exception is occurred.

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Vijay
In that case, wouldn't this work: SELECT buyer_id, item_id, rank(buyer_id), created_time FROM ( SELECT buyer_id, item_id, created_time FROM testingtable1 DISTRIBUTE BY buyer_id SORT BY buyer_id, created_time desc ) a WHERE rank(buyer_id) < 10; On Tue, Jul 10, 2012 at 4:21 PM, Rai

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Vijay
This is a little tricky but this is how it works: SELECT buyer_id, item_id, rank(item_id), created_time FROM ( SELECT buyer_id, item_id, created_time FROM testingtable1 DISTRIBUTE BY buyer_id, item_id SORT BY buyer_id, item_id, created_time desc ) a WHERE rank(item_id) < 10; On T

Re: What's wrong with this query?

2012-07-10 Thread Raihan Jamal
Thanks Vijay, Yes it worked. Can you also take a look into one of my other post subject title *TOP 10.* *Raihan Jamal* On Tue, Jul 10, 2012 at 1:41 PM, Vijay wrote: > to_date(from_unixtime(cast(timestamps as int))) > > On Tue, Jul 10, 2012 at 1:33 PM, Raihan Jamal > wrote: > > I need only

RE: Casting exception while converting from "LazyDouble" to "LazyString"

2012-07-10 Thread Kanna Karanam
Thanks Kulkarni - I think I have the right data types but here are DDL definitions and query. Query: select registration from studenttab10k s right outer join votertab10k v on (s.name = v.name); Table definitions: create external table IF NOT EXISTS

Re: What's wrong with this query?

2012-07-10 Thread Vijay
to_date(from_unixtime(cast(timestamps as int))) On Tue, Jul 10, 2012 at 1:33 PM, Raihan Jamal wrote: > I need only the date not the hours and second, so that is the reason I was > using to_date and from_unxitime() take int as parameter and timestamps is a > string in this case. > > > > > Raihan J

Re: What's wrong with this query?

2012-07-10 Thread Raihan Jamal
I need only the date not the hours and second, so that is the reason I was using to_date and from_unxitime() take int as parameter and timestamps is a string in this case. *Raihan Jamal* On Tue, Jul 10, 2012 at 1:28 PM, Vijay wrote: > You need to use from_unixtime() > > On Tue, Jul 10, 201

Re: What's wrong with this query?

2012-07-10 Thread Vijay
You need to use from_unixtime() On Tue, Jul 10, 2012 at 1:01 PM, Raihan Jamal wrote: > Here timestamps is a string and I always get NULL in the second column when > I try to get date out of the timestamp. Anything wrong I am doing? > > > select A.timestamps, to_date(A.timestamps) from > (select u

Re: Casting exception while converting from "LazyDouble" to "LazyString"

2012-07-10 Thread kulkarni.swar...@gmail.com
Hi Kanna, This might just mean that in your query you are having a STRING type for a field which is actually a DOUBLE. On Tue, Jul 10, 2012 at 3:05 PM, Kanna Karanam wrote: > Has anyone seen this error before? Am I missing anything here? > > ** ** > > 2012-07-10 11:11:02,203 INFO org.apache

Casting exception while converting from "LazyDouble" to "LazyString"

2012-07-10 Thread Kanna Karanam
Has anyone seen this error before? Am I missing anything here? 2012-07-10 11:11:02,203 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201207091248_0107_m_00_0: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing

What's wrong with this query?

2012-07-10 Thread Raihan Jamal
Here timestamps is a string and I always get NULL in the second column when I try to get date out of the timestamp. Anything wrong I am doing? * * * * select A.timestamps, to_date(A.timestamps) from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from tes

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Raihan Jamal
Still it's not working with the use of my rank UDF. Below is the query I am using Can anyone help me, what changes I need to make in my below sql query? CREATE TABLE IF NOT EXISTS TestingTable1 ( BUYER_ID BIGINT, ITEM_ID BIGINT, CREATED_TIME STRING ) *Find TOP 10 latest data (ITEM_ID, CREATE

Re: FileSystem Closed.

2012-07-10 Thread Harsh J
This appears to be a Hive issue (something probably called FS.close() too early?). Redirecting to the Hive user lists as they can help better with this. On Tue, Jul 10, 2012 at 9:59 PM, 안의건 wrote: > Hello. I have a problem with the filesystem closing. > > The filesystem was closed when the hive q

Re: Find TOP 10 using HiveQL

2012-07-10 Thread Wouter de Bie
You could use TRANSFORM with a simple awk script: TRANSFORM(a, b, c, d) USING "/usr/bin/awk ' {if($1!=c){c=$1; a=0}; if(a<20){print $0; a++}}'" This will create a top 20 for each group. --Wouter de Bie Team Lead Analytics Infrastructure, Spotify wou...@spotify.com (mailto:wou...@spotify.c

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Jasper Knulst
Hi Raihan, You should use 'rank(buyer_id)' in the order by clause on line 9 in stead of the alias 'rk'. I had the same problem, strangely, the alias is not resolved when it is in the order by clause. Other thing, I had some issues when I used this exact same set up for ranking results, that the r

Re: Find TOP 10 using HiveQL

2012-07-10 Thread Raihan Jamal
I am trying that solution. Currently I am running my query to see what result I am getting back with UDF. *Raihan Jamal* On Tue, Jul 10, 2012 at 12:13 AM, Nitin Pawar wrote: > i thought you managed to solve this with rank?? > > > On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal wrote: > >> Pr

Re: Find TOP 10 using HiveQL

2012-07-10 Thread Nitin Pawar
i thought you managed to solve this with rank?? On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal wrote: > Problem with that approach is, with LIMIT 10, If I am putting after desc, > then it will get only 10 rows irrespective of BUYER_ID. But I need > specifically for each BUYER_ID 10 latest rows. >

Re: Find TOP 10 using HiveQL

2012-07-10 Thread Raihan Jamal
Problem with that approach is, with LIMIT 10, If I am putting after desc, then it will get only 10 rows irrespective of BUYER_ID. But I need specifically for each BUYER_ID 10 latest rows. *Raihan Jamal* On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari < abhishektiwari.bt...@gmail.com> wrote

Re: Find TOP 10 using HiveQL

2012-07-10 Thread Abhishek Tiwari
Raihan, Andes suggests you use 'limit' after 'desc' ie Hive with sort your query results in descending order and then chop off any more than first 10 records. However, Hive will still run the full scan on data since it has to find the sorted list of records for you. On Tue, Jul 10, 2012 at 8:36 A

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Nitin Pawar
change this " SELECT buyer_id, item_id, created_time" to " SELECT buyer_id, item_id, created_time, rk" On Tue, Jul 10, 2012 at 12:30 PM, Raihan Jamal wrote: > I was not able to understand, This is my below qiuery that I am using > currently- > > SELECT buyer_id, item_id, created_time > FROM ( >

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Raihan Jamal
I was not able to understand, This is my below qiuery that I am using currently- SELECT buyer_id, item_id, created_time FROM ( SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time FROM testingtable1 DISTRIBUTE BY buyer_id, item_id SORT BY buyer_id, item_id, created_time d