"I guess do some kind of group by and store it in intermediate file and run
another select on it?"

Yes, that is my recommendation.

On Thu, May 24, 2012 at 2:57 PM, Mohit Anchlia <mohitanch...@gmail.com>wrote:

>
>
> On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:
>
>> Hive is not SQL 92 compliant or whatever.
>>
>> https://cwiki.apache.org/Hive/languagemanual.html
>>
>> in particular you can not do subselects inside the in or the where
>> clause. Hive usually have other formulations like left semi join that
>> makes things 'like in' and 'not in' possible.
>>
>>  Thanks. But what I am looking for is to select only those rows that are
> of min(t_timestamp) for a given a_id. What would be the best way? I guess
> do some kind of group by and store it in intermediate file and run another
> select on it?
>
>
>> Edward
>>  On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia <mohitanch...@gmail.com>
>> wrote:
>> > I am now trying to do it this way but doesn't work in hive. I think I am
>> > missing something here, can someone please help?
>> >
>> > select a_id from web_data t1 where a_id = (select min(a_id) from
>> web_data t2
>> > where t2.t_timestamp = t1.t_timestamp)
>> >
>> > I get:
>> >
>> >
>> > FAILED: Parse Error: line 1:69 cannot recognize input near 'select'
>> 'min'
>> > '(' in expression specification
>> >
>> >
>> >
>> > On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <mohitanch...@gmail.com>
>> > wrote:
>> >>
>> >> I am new to Hive. I have several SQL from RDBMS database that I need to
>> >> convert to hive. What's the best reference for HIVEQL? For now I am
>> trying
>> >> to figure out how to do this in hive:
>> >>
>> >> Select  distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY
>> >> A_ID ORDER BY t_timestamp) From WEB_DATA
>> >>
>> >> Any help would be appreciated.
>> >
>> >
>> >
>>
>
>

Reply via email to