org/apache/thrift/TException error

2012-07-19 Thread iwannaplay games
Hi, I have installed hive and set the environment variable export JAVA_HOME= export HADOOP_HOME= export HIVE_HOME= $ bin/hadoop fs -mkdir /tmp $ bin/hadoop fs -mkdir /user/hive/warehouse $ bin/hadoop fs -chmod g+w /tmp $ bin/hadoop fs -chmod g+w /user/hive/warehouse After this when i execute hi

Re: Request write access to the Hive wiki

2012-07-19 Thread Carl Steinbach
Granted! :) On Thu, Jul 19, 2012 at 12:56 AM, Lefty Leverenz wrote: > Please grant me write access to the Hive wiki so that I can work on > improving the documentation. > > Thank you. > > – Lefty Leverenz >le...@hortonworks.com > > >

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread comptech geeky
I wrote this query few minutes back- *select bid, pid, time from (* *select bid, pid, time, rank() over (partition by bid, pid order by time desc) as k * *from table1 ) as x * *where k <=3* *order by bid, pid, time desc* Do you think this query will work with my Rank function that I pro

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread comptech geeky
Hi Igor, I am new to HiveQL world. Don't know that much basically. Currently I have my Rank UDF function like this- *public final class Rank extends UDF{* *private int counter;* *private String last_key;* *public int evaluate(final String key){* * if ( !key.equalsIgnoreCase(this.las

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread Igor Tatarinov
Actually, never mind. Looks like you need to partition by both bid and pid. In that case, your problem is that rank() has to handle a combined bid+pid key. So first you need to create a combined key, partition by that key and pass it to your rank() function (assuming rank() knows to reset on a new

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread comptech geeky
Thanks Jasper for replying back. I have mentioned my use case in my first email. And also I have already wrote the HiveQL query with the rank function working but it is not giving me the exact output that I am supposed to get from the query. On Thu, Jul 19, 2012 at 3:53 PM, Jasper Knulst wrote:

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread Igor Tatarinov
Sorry, just pid needs to be dropped from both DISTRIBUTE and SORT clauses. Your very first query was correct except for the nested subquery part. (You don't need a double-nested subquery.) On Thu, Jul 19, 2012 at 3:48 PM, comptech geeky wrote: > Hi Igor, > > I am not sure what I have to remove fr

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread Jasper Knulst
I am not really aware of your use case. Play around with it. At least the rank function is now properly applied. Maybe, remove pid from the DISTRIBUTE and de SORT clauses ?? Jasper 2012/7/20 comptech geeky > Hi Igor, > > I am not sure what I have to remove from Distribute By as in distribut

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread comptech geeky
Hi Igor, I am not sure what I have to remove from Distribute By as in distribute by we have bid, pid and you said remove bid and time from distribute by and it doesn't have time *SELECT bid, pid, rank FROM * *(SELECT bid, pid, rank(bid) rank, time, UNIX_TIMESTAMP(time) FROM * *

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread Igor Tatarinov
Remove pid,time from DISTRIBUTE BY. On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky wrote: > Modified Query that I wrote and its not working as expected output is. > > * > * > *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)* > *FROM (* > *SELECT bid, pid, time* > *FROM table1* > *

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread Jasper Knulst
Hi, I more or less had the same problem and finally got it down by introducing a second subquery. This will guarantee that the rank function is invoked on the reduce phase and that the rank results are properly sorted. I guess something like this: *SELECT bid, pid, rank FROM * *(SELECT bi

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread Jasper Knulst
Hi, I more or less had the same problem and finally got it down by introducing a second subquery. This will guarantee that the rank function is invoked on the reduce phase and that the rank results are properly sorted. I guess something like this: *SELECT bid, pid, rank FROM * *(SELECT bi

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread comptech geeky
Can anyone help me with this? I have tried other options by tweaking the query also. I am not able to achieve my expected output. On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky wrote: > Modified Query that I wrote and its not working as expected output is. > > * > * > *SELECT bid, pid, rank(bi

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread comptech geeky
Modified Query that I wrote and its not working as expected output is. * * *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)* *FROM (* *SELECT bid, pid, time* *FROM table1* * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) = '2012-07-09'* *DISTRIBUTE BY bid,pid,ti

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread comptech geeky
I wrote this query after modifying it- *SELECT buyer_id, item_id, rank(buyer_id), created_time, UNIX_TIMESTAMP(created_time)* *FROM (* *SELECT buyer_id, item_id, created_time* *FROM testingtable1* * where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as int))) = '2012-07-09'* *

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread comptech geeky
Can you show me the exact query that I need to do for this particular problem consideing my scenario? It will be of great help to me. As I am new to HiveQL. I need TOP 3 for those if BID and PID gets matched but with different timestamp. -Raihan Jamal On Thu, Jul 19, 2012 at 1:15 PM, Philip Trom

Re: Something wrong with my query to get TOP 3?

2012-07-19 Thread Philip Tromans
Your rank() is being evaluated map side. Put your distribute by and sort by in an inner query, and then evaluate your rank() in an outer query. Phil. On Jul 19, 2012 9:00 PM, "comptech geeky" wrote: > This is the below data in my Table1 > > > BID PID TIME > --

Something wrong with my query to get TOP 3?

2012-07-19 Thread comptech geeky
This is the below data in my Table1 BID PID TIME --+-+ 1345653 330760137950 2012-07-09 21:42:29 1345653 330760137950 2012-07-09 21:43:29 1345653 330760137950 2012-07-09 21:40:29

Re: Performance tuning a hive query

2012-07-19 Thread kulkarni.swar...@gmail.com
Couple to add to the list: Indexing[1] Columnar Storage/RCFile[2] [1] https://cwiki.apache.org/confluence/display/Hive/IndexDev [2] http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-4.pdf On Thu, Jul 19, 2012 at 8:39 AM, Jan Dolinár wrote: > There are many ways, but beware

Re: Performance tuning a hive query

2012-07-19 Thread Jan Dolinár
There are many ways, but beware that some of them may result in worse performance when used inappropriately. Some of the settings we use to achieve faster queries: hive.map.aggr=true hive.exec.parallel=true hive.exec.compress.intermediate=true mapred.job.reuse.jvm.num.tasks=-1 Structuring the que

Re: Performance tuning a hive query

2012-07-19 Thread Nitin Pawar
depends on what kind of query if yoy are doing joins then there are different kind of join queries depending on how did you layout the data and how much of data is held in what table. On Thu, Jul 19, 2012 at 6:54 PM, Abhishek wrote: > > Apart from partitions and buckets how to improve of hive q

Performance tuning a hive query

2012-07-19 Thread Abhishek
Apart from partitions and buckets how to improve of hive queries Regards Abhi Sent from my iPhone

Re: Unable to start hive

2012-07-19 Thread iwannaplay games
Hi, Everything is set. [hduser@master bin]$ echo $HADOOP_HOME /usr/local/hadoop [hduser@master bin]$ echo $PATH usr/local/sqoop/bin:/usr/local/hive/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/hduser/bin:/usr/java/jdk1.6.0_33/bin:/usr/local/hadoop/bin created two directories $ bin/ha

Re: Unable to start hive

2012-07-19 Thread Bejoy KS
Hi Prabhjot Have you set $HADOOP_HOME? If not try setting that, if already set please verify whether it is the correct one. Normally I do set $HADOOP_HOME and $PATH and have never faced any issues. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: iwa

Request write access to the Hive wiki

2012-07-19 Thread Lefty Leverenz
Please grant me write access to the Hive wiki so that I can work on improving the documentation. Thank you. – Lefty Leverenz le...@hortonworks.com