Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Raihan Jamal
Hi Jan, I have date in different format also, so that is the reason I was thinking to do by this approach. How can I make sure this will work on the selected partition only and it will not scan the entire table. I will add your suggestion in my UDF as deterministic thing. My simple question here

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Raihan Jamal
@kulkarni, When I did explain on my query, I got these things, I am not sure how to understand these thing. Any help will be appreciated whether my approach is right or not?- hive> EXPLAIN SELECT * FROM PDS_ATTRIBUTE_DATA_REALTIME where dt=yesterdaydate('MMdd', 2) LIMIT 5; OK ABSTRACT S

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Jan Dolinár
Oops, sorry I made a copy&paste mistake :) The annotation should read @*UDFType(deterministic=true*) Jan On Tue, Aug 7, 2012 at 7:37 PM, Jan Dolinár wrote: > I'm afraid that he query > > SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT 10; > > will scan entire table, because th

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Jan Dolinár
I'm afraid that he query SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT 10; will scan entire table, because the functions is evaluated at runtime, so Hive doesn't know what the value is when it decides which files to scan. I am not 100% sure though, you should try it. Also, yo

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread kulkarni.swar...@gmail.com
Have you tried using EXPLAIN[1] on your query? I usually like to use that to get a better understanding of what my query is actually doing and debugging at other times. [1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain On Tue, Aug 7, 2012 at 12:20 PM, Raihan Jamal wrote

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Raihan Jamal
Hi Jan, I figured that out, it is working fine for me now. The only question I have is, if I am doing like this- SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT 10; Then the above query will be evaluated as below right? SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 1

Re: Custom UserDefinedFunction in Hive

2012-08-06 Thread Raihan Jamal
I tested that function using main and by printing it out and it works fine. As I am trying to get the Yesterday's date. I need my query to be like this as today's date is Aug 6th, so query should be for Aug 5th. And this works fine for me. *SELECT * FROM REALTIME where dt= '20120805' LIMIT 10;*

Re: Custom UserDefinedFunction in Hive

2012-08-06 Thread Jan Dolinár
Hi Jamal, Check if the function really returns what it should and that your data are really in MMdd format. You can do this by simple query like this: SELECT dt, yesterdaydate('MMdd') FROM REALTIME LIMIT 1; I don't see anything wrong with the function itself, it works well for me (althou