Let me try that and I will update on this thread.


*Raihan Jamal*



On Tue, Aug 7, 2012 at 11:39 AM, Techy Teck <comptechge...@gmail.com> wrote:

> Then that means I don't need to create that userdefinedfunction right?
>
>
>
> On Tue, Aug 7, 2012 at 11:32 AM, Jan Dolinár <dolik....@gmail.com> wrote:
>
>> Hi Jamal,
>>
>> date is standard linux/unix tool, see the manual page:
>> http://linux.die.net/man/1/date.
>>
>> The $(...) tells the shell to execute the command and insert it's output
>> into the string. So in this case it will execute command
>> date -d -1day +%Y%m%d
>> which returns yesterday date in the format you need.
>>
>> Jan
>>
>>
>> On Tue, Aug 7, 2012 at 8:24 PM, Raihan Jamal <jamalrai...@gmail.com>wrote:
>>
>>> Yes it supports -e option, but in your query what is date?
>>>
>>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
>>> AS 'com.example.hive.udf.YesterdayDate';
>>> SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;"
>>>
>>>
>>>
>>> *Raihan Jamal*
>>>
>>>
>>>
>>> On Tue, Aug 7, 2012 at 11:18 AM, Jan Dolinár <dolik....@gmail.com>wrote:
>>>
>>>> By the way, even without hiveconf, you can run hive from shell like
>>>> this to achieve what you want using shell capabilities:
>>>>
>>>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
>>>> AS 'com.example.hive.udf.YesterdayDate';
>>>> SELECT * FROM REALTIME where dt=$(date -d -1day +%Y%m%d) LIMIT 10;"
>>>>
>>>> At least if hive 6.0 supports -e option, I don't have where to check
>>>> that.
>>>>
>>>> Jan
>>>>
>>>>
>>>> On Tue, Aug 7, 2012 at 8:11 PM, Vijay <tec...@gmail.com> wrote:
>>>>
>>>>> Given the implementation of the UDF, I don't think hive would be able
>>>>> to use partition pruning. Especially the version you're using. I'd
>>>>> really recommend upgrading to a later version that has the hiveconf
>>>>> support. That can save a lot of trouble rather than trying to get
>>>>> things working on 0.6
>>>>>
>>>>> On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <jamalrai...@gmail.com>
>>>>> wrote:
>>>>> > Hi Jan,
>>>>> >
>>>>> > I have date in different format also, so that is the reason I was
>>>>> thinking
>>>>> > to do by this approach. How can I make sure this will work on the
>>>>> selected
>>>>> > partition only and it will not scan the entire table. I will add your
>>>>> > suggestion in my UDF as deterministic thing.
>>>>> >
>>>>> > My simple question here is- How to get the Yesterdays date which I
>>>>> can use
>>>>> > on the Date Partition I cannot use hiveconf here as I am working
>>>>> with Hive
>>>>> > 0.6
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Raihan Jamal
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <dolik....@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> I'm afraid that  he query
>>>>> >>
>>>>> >> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>>>>> >>
>>>>> >> will scan entire table, because the functions is evaluated at
>>>>> runtime, so
>>>>> >> Hive doesn't know what the value is when it decides which files to
>>>>> scan. I
>>>>> >> am not 100% sure though, you should try it.
>>>>> >>
>>>>> >> Also, you might want to try to add annotation to your UDF saying
>>>>> that the
>>>>> >> function is deterministic:
>>>>> >> @UDFType(deterministic=false)
>>>>> >>
>>>>> >> I think Hive might be clever enough to evaluate it early enough to
>>>>> use the
>>>>> >> partition pruning correctly, since it operates on constant
>>>>> expression. But
>>>>> >> again, I'm not really sure, maybe someone with deeper knowledge of
>>>>> Hive
>>>>> >> optimizations will tell us more. It is actually quite interesting
>>>>> question.
>>>>> >>
>>>>> >> Another way to help Hive with the optimizations might be to skip
>>>>> passing
>>>>> >> the format string argument, if you have all dates in same format,
>>>>> you can
>>>>> >> call the function just like 'yesterdaydate()' and hardcode the
>>>>> format in the
>>>>> >> function.
>>>>> >>
>>>>> >> Jan
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Aug 7, 2012 at 7:20 PM, Raihan Jamal <jamalrai...@gmail.com
>>>>> >
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hi Jan,
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> I figured that out, it is working fine for me now. The only
>>>>> question I
>>>>> >>> have is, if I am doing like this-
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT
>>>>> 10;
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> Then the above query will be evaluated as below right?
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> So that means it will look for data in the corresponding dt
>>>>> partition
>>>>> >>> (20120806) only right as above table is partitioned on dt column ?
>>>>> And it
>>>>> >>> will not scan the whole table right?
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> Raihan Jamal
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <dolik....@gmail.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> Hi Jamal,
>>>>> >>>>
>>>>> >>>> Check if the function really returns what it should and that your
>>>>> data
>>>>> >>>> are really in yyyyMMdd format. You can do this by simple query
>>>>> like this:
>>>>> >>>>
>>>>> >>>> SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1;
>>>>> >>>>
>>>>> >>>> I don't see anything wrong with the function itself, it works
>>>>> well for
>>>>> >>>> me (although I tested it in hive 0.7.1). The only thing I would
>>>>> change about
>>>>> >>>> it would be to optimize it by calling 'new' only at the time of
>>>>> construction
>>>>> >>>> and reusing the object when the function is called, but that
>>>>> should not
>>>>> >>>> affect the functionality at all.
>>>>> >>>>
>>>>> >>>> Best regards,
>>>>> >>>> Jan
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <
>>>>> jamalrai...@gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>>
>>>>> >>>>> Problem
>>>>> >>>>>
>>>>> >>>>> I created the below UserDefinedFunction to get the yesterday's
>>>>> day in
>>>>> >>>>> the format I wanted as I will be passing the format into this
>>>>> below method
>>>>> >>>>> from the query.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> public final class YesterdayDate extends UDF {
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>                 public String evaluate(final String format) {
>>>>> >>>>>
>>>>> >>>>>                                 DateFormat dateFormat = new
>>>>> >>>>> SimpleDateFormat(format);
>>>>> >>>>>
>>>>> >>>>>                                 Calendar cal =
>>>>> Calendar.getInstance();
>>>>> >>>>>
>>>>> >>>>>                                 cal.add(Calendar.DATE, -1);
>>>>> >>>>>
>>>>> >>>>>                                 return
>>>>> >>>>> dateFormat.format(cal.getTime()).toString();
>>>>> >>>>>
>>>>> >>>>>                 }
>>>>> >>>>>
>>>>> >>>>> }
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> So whenever I try to run the query like below by adding the jar
>>>>> to
>>>>> >>>>> classpath and creating the temporary function yesterdaydate, I
>>>>> always get
>>>>> >>>>> zero result back-
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> hive> create temporary function yesterdaydate as
>>>>> >>>>> 'com.example.hive.udf.YesterdayDate';
>>>>> >>>>>
>>>>> >>>>> OK
>>>>> >>>>>
>>>>> >>>>> Time taken: 0.512 seconds
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Below is the query I am running-
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd')
>>>>> LIMIT
>>>>> >>>>> 10;
>>>>> >>>>>
>>>>> >>>>> OK
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> And I always get zero result back but the data is there in that
>>>>> table
>>>>> >>>>> for Aug 5th.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> What wrong I am doing? Any suggestions will be appreciated.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support
>>>>> variable
>>>>> >>>>> substitution thing, so I cannot use hiveconf here and the above
>>>>> table has
>>>>> >>>>> been partitioned on dt(date) column.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to