Then that means I don't need to create that userdefinedfunction right?
On Tue, Aug 7, 2012 at 11:32 AM, Jan Dolinár <dolik....@gmail.com> wrote: > Hi Jamal, > > date is standard linux/unix tool, see the manual page: > http://linux.die.net/man/1/date. > > The $(...) tells the shell to execute the command and insert it's output > into the string. So in this case it will execute command > date -d -1day +%Y%m%d > which returns yesterday date in the format you need. > > Jan > > > On Tue, Aug 7, 2012 at 8:24 PM, Raihan Jamal <jamalrai...@gmail.com>wrote: > >> Yes it supports -e option, but in your query what is date? >> >> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate >> AS 'com.example.hive.udf.YesterdayDate'; >> SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;" >> >> >> >> *Raihan Jamal* >> >> >> >> On Tue, Aug 7, 2012 at 11:18 AM, Jan Dolinár <dolik....@gmail.com> wrote: >> >>> By the way, even without hiveconf, you can run hive from shell like this >>> to achieve what you want using shell capabilities: >>> >>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate >>> AS 'com.example.hive.udf.YesterdayDate'; >>> SELECT * FROM REALTIME where dt=$(date -d -1day +%Y%m%d) LIMIT 10;" >>> >>> At least if hive 6.0 supports -e option, I don't have where to check >>> that. >>> >>> Jan >>> >>> >>> On Tue, Aug 7, 2012 at 8:11 PM, Vijay <tec...@gmail.com> wrote: >>> >>>> Given the implementation of the UDF, I don't think hive would be able >>>> to use partition pruning. Especially the version you're using. I'd >>>> really recommend upgrading to a later version that has the hiveconf >>>> support. That can save a lot of trouble rather than trying to get >>>> things working on 0.6 >>>> >>>> On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <jamalrai...@gmail.com> >>>> wrote: >>>> > Hi Jan, >>>> > >>>> > I have date in different format also, so that is the reason I was >>>> thinking >>>> > to do by this approach. How can I make sure this will work on the >>>> selected >>>> > partition only and it will not scan the entire table. I will add your >>>> > suggestion in my UDF as deterministic thing. >>>> > >>>> > My simple question here is- How to get the Yesterdays date which I >>>> can use >>>> > on the Date Partition I cannot use hiveconf here as I am working with >>>> Hive >>>> > 0.6 >>>> > >>>> > >>>> > >>>> > >>>> > Raihan Jamal >>>> > >>>> > >>>> > >>>> > On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <dolik....@gmail.com> >>>> wrote: >>>> >> >>>> >> I'm afraid that he query >>>> >> >>>> >> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10; >>>> >> >>>> >> will scan entire table, because the functions is evaluated at >>>> runtime, so >>>> >> Hive doesn't know what the value is when it decides which files to >>>> scan. I >>>> >> am not 100% sure though, you should try it. >>>> >> >>>> >> Also, you might want to try to add annotation to your UDF saying >>>> that the >>>> >> function is deterministic: >>>> >> @UDFType(deterministic=false) >>>> >> >>>> >> I think Hive might be clever enough to evaluate it early enough to >>>> use the >>>> >> partition pruning correctly, since it operates on constant >>>> expression. But >>>> >> again, I'm not really sure, maybe someone with deeper knowledge of >>>> Hive >>>> >> optimizations will tell us more. It is actually quite interesting >>>> question. >>>> >> >>>> >> Another way to help Hive with the optimizations might be to skip >>>> passing >>>> >> the format string argument, if you have all dates in same format, >>>> you can >>>> >> call the function just like 'yesterdaydate()' and hardcode the >>>> format in the >>>> >> function. >>>> >> >>>> >> Jan >>>> >> >>>> >> >>>> >> On Tue, Aug 7, 2012 at 7:20 PM, Raihan Jamal <jamalrai...@gmail.com> >>>> >> wrote: >>>> >>> >>>> >>> Hi Jan, >>>> >>> >>>> >>> >>>> >>> >>>> >>> I figured that out, it is working fine for me now. The only >>>> question I >>>> >>> have is, if I am doing like this- >>>> >>> >>>> >>> >>>> >>> >>>> >>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10; >>>> >>> >>>> >>> >>>> >>> >>>> >>> Then the above query will be evaluated as below right? >>>> >>> >>>> >>> >>>> >>> >>>> >>> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10; >>>> >>> >>>> >>> >>>> >>> >>>> >>> So that means it will look for data in the corresponding dt >>>> partition >>>> >>> (20120806) only right as above table is partitioned on dt column ? >>>> And it >>>> >>> will not scan the whole table right? >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> Raihan Jamal >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <dolik....@gmail.com> >>>> wrote: >>>> >>>> >>>> >>>> Hi Jamal, >>>> >>>> >>>> >>>> Check if the function really returns what it should and that your >>>> data >>>> >>>> are really in yyyyMMdd format. You can do this by simple query >>>> like this: >>>> >>>> >>>> >>>> SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1; >>>> >>>> >>>> >>>> I don't see anything wrong with the function itself, it works well >>>> for >>>> >>>> me (although I tested it in hive 0.7.1). The only thing I would >>>> change about >>>> >>>> it would be to optimize it by calling 'new' only at the time of >>>> construction >>>> >>>> and reusing the object when the function is called, but that >>>> should not >>>> >>>> affect the functionality at all. >>>> >>>> >>>> >>>> Best regards, >>>> >>>> Jan >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal < >>>> jamalrai...@gmail.com> >>>> >>>> wrote: >>>> >>>>> >>>> >>>>> Problem >>>> >>>>> >>>> >>>>> I created the below UserDefinedFunction to get the yesterday's >>>> day in >>>> >>>>> the format I wanted as I will be passing the format into this >>>> below method >>>> >>>>> from the query. >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> public final class YesterdayDate extends UDF { >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> public String evaluate(final String format) { >>>> >>>>> >>>> >>>>> DateFormat dateFormat = new >>>> >>>>> SimpleDateFormat(format); >>>> >>>>> >>>> >>>>> Calendar cal = >>>> Calendar.getInstance(); >>>> >>>>> >>>> >>>>> cal.add(Calendar.DATE, -1); >>>> >>>>> >>>> >>>>> return >>>> >>>>> dateFormat.format(cal.getTime()).toString(); >>>> >>>>> >>>> >>>>> } >>>> >>>>> >>>> >>>>> } >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> So whenever I try to run the query like below by adding the jar to >>>> >>>>> classpath and creating the temporary function yesterdaydate, I >>>> always get >>>> >>>>> zero result back- >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> hive> create temporary function yesterdaydate as >>>> >>>>> 'com.example.hive.udf.YesterdayDate'; >>>> >>>>> >>>> >>>>> OK >>>> >>>>> >>>> >>>>> Time taken: 0.512 seconds >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> Below is the query I am running- >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') >>>> LIMIT >>>> >>>>> 10; >>>> >>>>> >>>> >>>>> OK >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> And I always get zero result back but the data is there in that >>>> table >>>> >>>>> for Aug 5th. >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> What wrong I am doing? Any suggestions will be appreciated. >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support >>>> variable >>>> >>>>> substitution thing, so I cannot use hiveconf here and the above >>>> table has >>>> >>>>> been partitioned on dt(date) column. >>>> >>>> >>>> >>>> >>>> >>> >>>> >> >>>> > >>>> >>> >>> >> >