On Thu, May 24, 2012 at 3:10 PM, Gesli, Nicole <nicole.ge...@memorylane.com>wrote:
> Try this: > > SELECT a.a_id, b.path > FROM ( SELECT a_id, MIN(t_timestamp) t_timestamp > FROM web_data > GROUP BY a_id > ) a JOIN > web_data b ON ( b.a_id = a.a_id AND b.t_timestamp = a.t_timestamp ) > Awesome that works. I didn't realize I could do it this way. Thanks for your help and others as well. > > -Nicole > > From: Roberto Sanabria <robe...@stumbleupon.com> > Reply-To: <user@hive.apache.org> > Date: Thu, 24 May 2012 15:06:29 -0700 > To: <user@hive.apache.org> > Subject: Re: SQL help > > "I guess do some kind of group by and store it in intermediate file and > run another select on it?" > > Yes, that is my recommendation. > > On Thu, May 24, 2012 at 2:57 PM, Mohit Anchlia <mohitanch...@gmail.com>wrote: > >> >> >> On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo >> <edlinuxg...@gmail.com>wrote: >> >>> Hive is not SQL 92 compliant or whatever. >>> >>> https://cwiki.apache.org/Hive/languagemanual.html >>> >>> in particular you can not do subselects inside the in or the where >>> clause. Hive usually have other formulations like left semi join that >>> makes things 'like in' and 'not in' possible. >>> >>> Thanks. But what I am looking for is to select only those rows that are >> of min(t_timestamp) for a given a_id. What would be the best way? I guess >> do some kind of group by and store it in intermediate file and run another >> select on it? >> >> >>> Edward >>> On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia <mohitanch...@gmail.com> >>> wrote: >>> > I am now trying to do it this way but doesn't work in hive. I think I >>> am >>> > missing something here, can someone please help? >>> > >>> > select a_id from web_data t1 where a_id = (select min(a_id) from >>> web_data t2 >>> > where t2.t_timestamp = t1.t_timestamp) >>> > >>> > I get: >>> > >>> > >>> > FAILED: Parse Error: line 1:69 cannot recognize input near 'select' >>> 'min' >>> > '(' in expression specification >>> > >>> > >>> > >>> > On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <mohitanch...@gmail.com >>> > >>> > wrote: >>> >> >>> >> I am new to Hive. I have several SQL from RDBMS database that I need >>> to >>> >> convert to hive. What's the best reference for HIVEQL? For now I am >>> trying >>> >> to figure out how to do this in hive: >>> >> >>> >> Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION >>> BY >>> >> A_ID ORDER BY t_timestamp) From WEB_DATA >>> >> >>> >> Any help would be appreciated. >>> > >>> > >>> > >>> >> >> >