Hi,
I have installed hive and set the environment variable
export JAVA_HOME=
export HADOOP_HOME=
export HIVE_HOME=
$ bin/hadoop fs -mkdir /tmp
$ bin/hadoop fs -mkdir /user/hive/warehouse
$ bin/hadoop fs -chmod g+w /tmp
$ bin/hadoop fs -chmod g+w /user/hive/warehouse
After this when i execute hi
Granted! :)
On Thu, Jul 19, 2012 at 12:56 AM, Lefty Leverenz wrote:
> Please grant me write access to the Hive wiki so that I can work on
> improving the documentation.
>
> Thank you.
>
> – Lefty Leverenz
>le...@hortonworks.com
>
>
>
I wrote this query few minutes back-
*select bid, pid, time from (*
*select bid, pid, time, rank() over (partition by bid, pid order by
time desc) as k *
*from table1 ) as x *
*where k <=3*
*order by bid, pid, time desc*
Do you think this query will work with my Rank function that I pro
Hi Igor,
I am new to HiveQL world. Don't know that much basically. Currently I have
my Rank UDF function like this-
*public final class Rank extends UDF{*
*private int counter;*
*private String last_key;*
*public int evaluate(final String key){*
* if ( !key.equalsIgnoreCase(this.las
Actually, never mind. Looks like you need to partition by both bid and pid.
In that case, your problem is that rank() has to handle a combined bid+pid
key. So first you need to create a combined key, partition by that key and
pass it to your rank() function (assuming rank() knows to reset on a new
Thanks Jasper for replying back. I have mentioned my use case in my first
email. And also I have already wrote the HiveQL query with the rank
function working but it is not giving me the exact output that I am
supposed to get from the query.
On Thu, Jul 19, 2012 at 3:53 PM, Jasper Knulst
wrote:
Sorry, just pid needs to be dropped from both DISTRIBUTE and SORT clauses.
Your very first query was correct except for the nested subquery part. (You
don't need a double-nested subquery.)
On Thu, Jul 19, 2012 at 3:48 PM, comptech geeky wrote:
> Hi Igor,
>
> I am not sure what I have to remove fr
I am not really aware of your use case.
Play around with it. At least the rank function is now properly applied.
Maybe, remove pid from the DISTRIBUTE and de SORT clauses ??
Jasper
2012/7/20 comptech geeky
> Hi Igor,
>
> I am not sure what I have to remove from Distribute By as in distribut
Hi Igor,
I am not sure what I have to remove from Distribute By as in distribute by
we have bid, pid and you said remove bid and time from distribute by and it
doesn't have time
*SELECT bid, pid, rank FROM *
*(SELECT bid, pid, rank(bid) rank, time, UNIX_TIMESTAMP(time) FROM
*
*
Remove pid,time from DISTRIBUTE BY.
On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky wrote:
> Modified Query that I wrote and its not working as expected output is.
>
> *
> *
> *SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)*
> *FROM (*
> *SELECT bid, pid, time*
> *FROM table1*
> *
Hi,
I more or less had the same problem and finally got it down by introducing
a second subquery. This will guarantee that the rank function is invoked
on the reduce phase and that the rank results are properly sorted.
I guess something like this:
*SELECT bid, pid, rank FROM *
*(SELECT bi
Hi,
I more or less had the same problem and finally got it down by introducing
a second subquery. This will guarantee that the rank function is invoked
on the reduce phase and that the rank results are properly sorted.
I guess something like this:
*SELECT bid, pid, rank FROM *
*(SELECT bi
Can anyone help me with this? I have tried other options by tweaking the
query also. I am not able to achieve my expected output.
On Thu, Jul 19, 2012 at 1:45 PM, comptech geeky wrote:
> Modified Query that I wrote and its not working as expected output is.
>
> *
> *
> *SELECT bid, pid, rank(bi
Modified Query that I wrote and its not working as expected output is.
*
*
*SELECT bid, pid, rank(bid), time, UNIX_TIMESTAMP(time)*
*FROM (*
*SELECT bid, pid, time*
*FROM table1*
* where to_date(from_unixtime(cast(UNIX_TIMESTAMP(time) as int))) =
'2012-07-09'*
*DISTRIBUTE BY bid,pid,ti
I wrote this query after modifying it-
*SELECT buyer_id, item_id, rank(buyer_id), created_time,
UNIX_TIMESTAMP(created_time)*
*FROM (*
*SELECT buyer_id, item_id, created_time*
*FROM testingtable1*
* where to_date(from_unixtime(cast(UNIX_TIMESTAMP(created_time) as int))) =
'2012-07-09'*
*
Can you show me the exact query that I need to do for this particular
problem consideing my scenario? It will be of great help to me. As I am new
to HiveQL.
I need TOP 3 for those if BID and PID gets matched but with different
timestamp.
-Raihan Jamal
On Thu, Jul 19, 2012 at 1:15 PM, Philip Trom
Your rank() is being evaluated map side. Put your distribute by and sort by
in an inner query, and then evaluate your rank() in an outer query.
Phil.
On Jul 19, 2012 9:00 PM, "comptech geeky" wrote:
> This is the below data in my Table1
>
>
> BID PID TIME
> --
This is the below data in my Table1
BID PID TIME
--+-+
1345653 330760137950 2012-07-09 21:42:29
1345653 330760137950 2012-07-09 21:43:29
1345653 330760137950 2012-07-09 21:40:29
Couple to add to the list:
Indexing[1]
Columnar Storage/RCFile[2]
[1] https://cwiki.apache.org/confluence/display/Hive/IndexDev
[2]
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-4.pdf
On Thu, Jul 19, 2012 at 8:39 AM, Jan Dolinár wrote:
> There are many ways, but beware
There are many ways, but beware that some of them may result in worse
performance when used inappropriately.
Some of the settings we use to achieve faster queries:
hive.map.aggr=true
hive.exec.parallel=true
hive.exec.compress.intermediate=true
mapred.job.reuse.jvm.num.tasks=-1
Structuring the que
depends on what kind of query
if yoy are doing joins then there are different kind of join queries
depending on how did you layout the data and how much of data is held in
what table.
On Thu, Jul 19, 2012 at 6:54 PM, Abhishek wrote:
>
> Apart from partitions and buckets how to improve of hive q
Apart from partitions and buckets how to improve of hive queries
Regards
Abhi
Sent from my iPhone
Hi,
Everything is set.
[hduser@master bin]$ echo $HADOOP_HOME
/usr/local/hadoop
[hduser@master bin]$ echo $PATH
usr/local/sqoop/bin:/usr/local/hive/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/hduser/bin:/usr/java/jdk1.6.0_33/bin:/usr/local/hadoop/bin
created two directories
$ bin/ha
Hi Prabhjot
Have you set $HADOOP_HOME? If not try setting that, if already set please
verify whether it is the correct one.
Normally I do set $HADOOP_HOME and $PATH and have never faced any issues.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: iwa
Please grant me write access to the Hive wiki so that I can work on
improving the documentation.
Thank you.
– Lefty Leverenz
le...@hortonworks.com
25 matches
Mail list logo