Let me try that and I will update on this thread.
*Raihan Jamal* On Tue, Aug 7, 2012 at 11:39 AM, Techy Teck <comptechge...@gmail.com> wrote: > Then that means I don't need to create that userdefinedfunction right? > > > > On Tue, Aug 7, 2012 at 11:32 AM, Jan Dolinár <dolik....@gmail.com> wrote: > >> Hi Jamal, >> >> date is standard linux/unix tool, see the manual page: >> http://linux.die.net/man/1/date. >> >> The $(...) tells the shell to execute the command and insert it's output >> into the string. So in this case it will execute command >> date -d -1day +%Y%m%d >> which returns yesterday date in the format you need. >> >> Jan >> >> >> On Tue, Aug 7, 2012 at 8:24 PM, Raihan Jamal <jamalrai...@gmail.com>wrote: >> >>> Yes it supports -e option, but in your query what is date? >>> >>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate >>> AS 'com.example.hive.udf.YesterdayDate'; >>> SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;" >>> >>> >>> >>> *Raihan Jamal* >>> >>> >>> >>> On Tue, Aug 7, 2012 at 11:18 AM, Jan Dolinár <dolik....@gmail.com>wrote: >>> >>>> By the way, even without hiveconf, you can run hive from shell like >>>> this to achieve what you want using shell capabilities: >>>> >>>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate >>>> AS 'com.example.hive.udf.YesterdayDate'; >>>> SELECT * FROM REALTIME where dt=$(date -d -1day +%Y%m%d) LIMIT 10;" >>>> >>>> At least if hive 6.0 supports -e option, I don't have where to check >>>> that. >>>> >>>> Jan >>>> >>>> >>>> On Tue, Aug 7, 2012 at 8:11 PM, Vijay <tec...@gmail.com> wrote: >>>> >>>>> Given the implementation of the UDF, I don't think hive would be able >>>>> to use partition pruning. Especially the version you're using. I'd >>>>> really recommend upgrading to a later version that has the hiveconf >>>>> support. That can save a lot of trouble rather than trying to get >>>>> things working on 0.6 >>>>> >>>>> On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <jamalrai...@gmail.com> >>>>> wrote: >>>>> > Hi Jan, >>>>> > >>>>> > I have date in different format also, so that is the reason I was >>>>> thinking >>>>> > to do by this approach. How can I make sure this will work on the >>>>> selected >>>>> > partition only and it will not scan the entire table. I will add your >>>>> > suggestion in my UDF as deterministic thing. >>>>> > >>>>> > My simple question here is- How to get the Yesterdays date which I >>>>> can use >>>>> > on the Date Partition I cannot use hiveconf here as I am working >>>>> with Hive >>>>> > 0.6 >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > Raihan Jamal >>>>> > >>>>> > >>>>> > >>>>> > On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <dolik....@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> I'm afraid that he query >>>>> >> >>>>> >> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10; >>>>> >> >>>>> >> will scan entire table, because the functions is evaluated at >>>>> runtime, so >>>>> >> Hive doesn't know what the value is when it decides which files to >>>>> scan. I >>>>> >> am not 100% sure though, you should try it. >>>>> >> >>>>> >> Also, you might want to try to add annotation to your UDF saying >>>>> that the >>>>> >> function is deterministic: >>>>> >> @UDFType(deterministic=false) >>>>> >> >>>>> >> I think Hive might be clever enough to evaluate it early enough to >>>>> use the >>>>> >> partition pruning correctly, since it operates on constant >>>>> expression. But >>>>> >> again, I'm not really sure, maybe someone with deeper knowledge of >>>>> Hive >>>>> >> optimizations will tell us more. It is actually quite interesting >>>>> question. >>>>> >> >>>>> >> Another way to help Hive with the optimizations might be to skip >>>>> passing >>>>> >> the format string argument, if you have all dates in same format, >>>>> you can >>>>> >> call the function just like 'yesterdaydate()' and hardcode the >>>>> format in the >>>>> >> function. >>>>> >> >>>>> >> Jan >>>>> >> >>>>> >> >>>>> >> On Tue, Aug 7, 2012 at 7:20 PM, Raihan Jamal <jamalrai...@gmail.com >>>>> > >>>>> >> wrote: >>>>> >>> >>>>> >>> Hi Jan, >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> I figured that out, it is working fine for me now. The only >>>>> question I >>>>> >>> have is, if I am doing like this- >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT >>>>> 10; >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> Then the above query will be evaluated as below right? >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10; >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> So that means it will look for data in the corresponding dt >>>>> partition >>>>> >>> (20120806) only right as above table is partitioned on dt column ? >>>>> And it >>>>> >>> will not scan the whole table right? >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> Raihan Jamal >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <dolik....@gmail.com> >>>>> wrote: >>>>> >>>> >>>>> >>>> Hi Jamal, >>>>> >>>> >>>>> >>>> Check if the function really returns what it should and that your >>>>> data >>>>> >>>> are really in yyyyMMdd format. You can do this by simple query >>>>> like this: >>>>> >>>> >>>>> >>>> SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1; >>>>> >>>> >>>>> >>>> I don't see anything wrong with the function itself, it works >>>>> well for >>>>> >>>> me (although I tested it in hive 0.7.1). The only thing I would >>>>> change about >>>>> >>>> it would be to optimize it by calling 'new' only at the time of >>>>> construction >>>>> >>>> and reusing the object when the function is called, but that >>>>> should not >>>>> >>>> affect the functionality at all. >>>>> >>>> >>>>> >>>> Best regards, >>>>> >>>> Jan >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal < >>>>> jamalrai...@gmail.com> >>>>> >>>> wrote: >>>>> >>>>> >>>>> >>>>> Problem >>>>> >>>>> >>>>> >>>>> I created the below UserDefinedFunction to get the yesterday's >>>>> day in >>>>> >>>>> the format I wanted as I will be passing the format into this >>>>> below method >>>>> >>>>> from the query. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> public final class YesterdayDate extends UDF { >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> public String evaluate(final String format) { >>>>> >>>>> >>>>> >>>>> DateFormat dateFormat = new >>>>> >>>>> SimpleDateFormat(format); >>>>> >>>>> >>>>> >>>>> Calendar cal = >>>>> Calendar.getInstance(); >>>>> >>>>> >>>>> >>>>> cal.add(Calendar.DATE, -1); >>>>> >>>>> >>>>> >>>>> return >>>>> >>>>> dateFormat.format(cal.getTime()).toString(); >>>>> >>>>> >>>>> >>>>> } >>>>> >>>>> >>>>> >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> So whenever I try to run the query like below by adding the jar >>>>> to >>>>> >>>>> classpath and creating the temporary function yesterdaydate, I >>>>> always get >>>>> >>>>> zero result back- >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> hive> create temporary function yesterdaydate as >>>>> >>>>> 'com.example.hive.udf.YesterdayDate'; >>>>> >>>>> >>>>> >>>>> OK >>>>> >>>>> >>>>> >>>>> Time taken: 0.512 seconds >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Below is the query I am running- >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') >>>>> LIMIT >>>>> >>>>> 10; >>>>> >>>>> >>>>> >>>>> OK >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> And I always get zero result back but the data is there in that >>>>> table >>>>> >>>>> for Aug 5th. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> What wrong I am doing? Any suggestions will be appreciated. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support >>>>> variable >>>>> >>>>> substitution thing, so I cannot use hiveconf here and the above >>>>> table has >>>>> >>>>> been partitioned on dt(date) column. >>>>> >>>> >>>>> >>>> >>>>> >>> >>>>> >> >>>>> > >>>>> >>>> >>>> >>> >> >