"I guess do some kind of group by and store it in intermediate file and run another select on it?"
Yes, that is my recommendation. On Thu, May 24, 2012 at 2:57 PM, Mohit Anchlia <mohitanch...@gmail.com>wrote: > > > On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > >> Hive is not SQL 92 compliant or whatever. >> >> https://cwiki.apache.org/Hive/languagemanual.html >> >> in particular you can not do subselects inside the in or the where >> clause. Hive usually have other formulations like left semi join that >> makes things 'like in' and 'not in' possible. >> >> Thanks. But what I am looking for is to select only those rows that are > of min(t_timestamp) for a given a_id. What would be the best way? I guess > do some kind of group by and store it in intermediate file and run another > select on it? > > >> Edward >> On Thu, May 24, 2012 at 5:13 PM, Mohit Anchlia <mohitanch...@gmail.com> >> wrote: >> > I am now trying to do it this way but doesn't work in hive. I think I am >> > missing something here, can someone please help? >> > >> > select a_id from web_data t1 where a_id = (select min(a_id) from >> web_data t2 >> > where t2.t_timestamp = t1.t_timestamp) >> > >> > I get: >> > >> > >> > FAILED: Parse Error: line 1:69 cannot recognize input near 'select' >> 'min' >> > '(' in expression specification >> > >> > >> > >> > On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia <mohitanch...@gmail.com> >> > wrote: >> >> >> >> I am new to Hive. I have several SQL from RDBMS database that I need to >> >> convert to hive. What's the best reference for HIVEQL? For now I am >> trying >> >> to figure out how to do this in hive: >> >> >> >> Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY >> >> A_ID ORDER BY t_timestamp) From WEB_DATA >> >> >> >> Any help would be appreciated. >> > >> > >> > >> > >