Then that means I don't need to create that userdefinedfunction right?


On Tue, Aug 7, 2012 at 11:32 AM, Jan Dolinár <dolik....@gmail.com> wrote:

> Hi Jamal,
>
> date is standard linux/unix tool, see the manual page:
> http://linux.die.net/man/1/date.
>
> The $(...) tells the shell to execute the command and insert it's output
> into the string. So in this case it will execute command
> date -d -1day +%Y%m%d
> which returns yesterday date in the format you need.
>
> Jan
>
>
> On Tue, Aug 7, 2012 at 8:24 PM, Raihan Jamal <jamalrai...@gmail.com>wrote:
>
>> Yes it supports -e option, but in your query what is date?
>>
>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
>> AS 'com.example.hive.udf.YesterdayDate';
>> SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;"
>>
>>
>>
>> *Raihan Jamal*
>>
>>
>>
>> On Tue, Aug 7, 2012 at 11:18 AM, Jan Dolinár <dolik....@gmail.com> wrote:
>>
>>> By the way, even without hiveconf, you can run hive from shell like this
>>> to achieve what you want using shell capabilities:
>>>
>>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
>>> AS 'com.example.hive.udf.YesterdayDate';
>>> SELECT * FROM REALTIME where dt=$(date -d -1day +%Y%m%d) LIMIT 10;"
>>>
>>> At least if hive 6.0 supports -e option, I don't have where to check
>>> that.
>>>
>>> Jan
>>>
>>>
>>> On Tue, Aug 7, 2012 at 8:11 PM, Vijay <tec...@gmail.com> wrote:
>>>
>>>> Given the implementation of the UDF, I don't think hive would be able
>>>> to use partition pruning. Especially the version you're using. I'd
>>>> really recommend upgrading to a later version that has the hiveconf
>>>> support. That can save a lot of trouble rather than trying to get
>>>> things working on 0.6
>>>>
>>>> On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <jamalrai...@gmail.com>
>>>> wrote:
>>>> > Hi Jan,
>>>> >
>>>> > I have date in different format also, so that is the reason I was
>>>> thinking
>>>> > to do by this approach. How can I make sure this will work on the
>>>> selected
>>>> > partition only and it will not scan the entire table. I will add your
>>>> > suggestion in my UDF as deterministic thing.
>>>> >
>>>> > My simple question here is- How to get the Yesterdays date which I
>>>> can use
>>>> > on the Date Partition I cannot use hiveconf here as I am working with
>>>> Hive
>>>> > 0.6
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Raihan Jamal
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <dolik....@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> I'm afraid that  he query
>>>> >>
>>>> >> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>>>> >>
>>>> >> will scan entire table, because the functions is evaluated at
>>>> runtime, so
>>>> >> Hive doesn't know what the value is when it decides which files to
>>>> scan. I
>>>> >> am not 100% sure though, you should try it.
>>>> >>
>>>> >> Also, you might want to try to add annotation to your UDF saying
>>>> that the
>>>> >> function is deterministic:
>>>> >> @UDFType(deterministic=false)
>>>> >>
>>>> >> I think Hive might be clever enough to evaluate it early enough to
>>>> use the
>>>> >> partition pruning correctly, since it operates on constant
>>>> expression. But
>>>> >> again, I'm not really sure, maybe someone with deeper knowledge of
>>>> Hive
>>>> >> optimizations will tell us more. It is actually quite interesting
>>>> question.
>>>> >>
>>>> >> Another way to help Hive with the optimizations might be to skip
>>>> passing
>>>> >> the format string argument, if you have all dates in same format,
>>>> you can
>>>> >> call the function just like 'yesterdaydate()' and hardcode the
>>>> format in the
>>>> >> function.
>>>> >>
>>>> >> Jan
>>>> >>
>>>> >>
>>>> >> On Tue, Aug 7, 2012 at 7:20 PM, Raihan Jamal <jamalrai...@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi Jan,
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> I figured that out, it is working fine for me now. The only
>>>> question I
>>>> >>> have is, if I am doing like this-
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> Then the above query will be evaluated as below right?
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> So that means it will look for data in the corresponding dt
>>>> partition
>>>> >>> (20120806) only right as above table is partitioned on dt column ?
>>>> And it
>>>> >>> will not scan the whole table right?
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> Raihan Jamal
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <dolik....@gmail.com>
>>>> wrote:
>>>> >>>>
>>>> >>>> Hi Jamal,
>>>> >>>>
>>>> >>>> Check if the function really returns what it should and that your
>>>> data
>>>> >>>> are really in yyyyMMdd format. You can do this by simple query
>>>> like this:
>>>> >>>>
>>>> >>>> SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1;
>>>> >>>>
>>>> >>>> I don't see anything wrong with the function itself, it works well
>>>> for
>>>> >>>> me (although I tested it in hive 0.7.1). The only thing I would
>>>> change about
>>>> >>>> it would be to optimize it by calling 'new' only at the time of
>>>> construction
>>>> >>>> and reusing the object when the function is called, but that
>>>> should not
>>>> >>>> affect the functionality at all.
>>>> >>>>
>>>> >>>> Best regards,
>>>> >>>> Jan
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <
>>>> jamalrai...@gmail.com>
>>>> >>>> wrote:
>>>> >>>>>
>>>> >>>>> Problem
>>>> >>>>>
>>>> >>>>> I created the below UserDefinedFunction to get the yesterday's
>>>> day in
>>>> >>>>> the format I wanted as I will be passing the format into this
>>>> below method
>>>> >>>>> from the query.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> public final class YesterdayDate extends UDF {
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>                 public String evaluate(final String format) {
>>>> >>>>>
>>>> >>>>>                                 DateFormat dateFormat = new
>>>> >>>>> SimpleDateFormat(format);
>>>> >>>>>
>>>> >>>>>                                 Calendar cal =
>>>> Calendar.getInstance();
>>>> >>>>>
>>>> >>>>>                                 cal.add(Calendar.DATE, -1);
>>>> >>>>>
>>>> >>>>>                                 return
>>>> >>>>> dateFormat.format(cal.getTime()).toString();
>>>> >>>>>
>>>> >>>>>                 }
>>>> >>>>>
>>>> >>>>> }
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> So whenever I try to run the query like below by adding the jar to
>>>> >>>>> classpath and creating the temporary function yesterdaydate, I
>>>> always get
>>>> >>>>> zero result back-
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> hive> create temporary function yesterdaydate as
>>>> >>>>> 'com.example.hive.udf.YesterdayDate';
>>>> >>>>>
>>>> >>>>> OK
>>>> >>>>>
>>>> >>>>> Time taken: 0.512 seconds
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Below is the query I am running-
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd')
>>>> LIMIT
>>>> >>>>> 10;
>>>> >>>>>
>>>> >>>>> OK
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> And I always get zero result back but the data is there in that
>>>> table
>>>> >>>>> for Aug 5th.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> What wrong I am doing? Any suggestions will be appreciated.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support
>>>> variable
>>>> >>>>> substitution thing, so I cannot use hiveconf here and the above
>>>> table has
>>>> >>>>> been partitioned on dt(date) column.
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Reply via email to