What is hive doing wrong?

2012-05-24 Thread Edward Capriolo
Over the past few months I have seen Spark through around a couple times. https://github.com/mesos/spark You know what strikes me as odd. Not once has spark EVER been mentioned on this mailing list! (To my knowledge). This is something similar to HadoopDB. I mean it is open source and all so no

Re: SQL help

2012-05-24 Thread Mohit Anchlia
On Thu, May 24, 2012 at 3:10 PM, Gesli, Nicole wrote: > Try this: > > SELECT a.a_id, b.path > FROM ( SELECT a_id, MIN(t_timestamp) t_timestamp >FROM web_data >GROUP BY a_id > ) a JOIN >web_data b ON ( b.a_id = a.a_id AND b.t_timestamp = a.t_timestamp ) > Awesome t

Re: SQL help

2012-05-24 Thread Gesli, Nicole
Try this: SELECT a.a_id, b.path FROM ( SELECT a_id, MIN(t_timestamp) t_timestamp FROM web_data GROUP BY a_id ) a JOIN web_data b ON ( b.a_id = a.a_id AND b.t_timestamp = a.t_timestamp ) -Nicole From: Roberto Sanabria mailto:robe...@stumbleupon.com>> Reply-To: mailto:u

Re: SQL help

2012-05-24 Thread Roberto Sanabria
"I guess do some kind of group by and store it in intermediate file and run another select on it?" Yes, that is my recommendation. On Thu, May 24, 2012 at 2:57 PM, Mohit Anchlia wrote: > > > On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo wrote: > >> Hive is not SQL 92 compliant or whatever. >>

Re: SQL help

2012-05-24 Thread Mohit Anchlia
On Thu, May 24, 2012 at 2:19 PM, Edward Capriolo wrote: > Hive is not SQL 92 compliant or whatever. > > https://cwiki.apache.org/Hive/languagemanual.html > > in particular you can not do subselects inside the in or the where > clause. Hive usually have other formulations like left semi join that >

Re: SQL help

2012-05-24 Thread Edward Capriolo
Hive is not SQL 92 compliant or whatever. https://cwiki.apache.org/Hive/languagemanual.html in particular you can not do subselects inside the in or the where clause. Hive usually have other formulations like left semi join that makes things 'like in' and 'not in' possible. Edward On Thu, May 24

Re: SQL help

2012-05-24 Thread Roberto Sanabria
You can try using where a_id in (subquery). But I don't think hive supports subqueries in where clauses. You would have to turn this into a join statement. On Thu, May 24, 2012 at 2:13 PM, Mohit Anchlia wrote: > I am now trying to do it this way but doesn't work in hive. I think I am > missing so

Re: SQL help

2012-05-24 Thread Mohit Anchlia
I am now trying to do it this way but doesn't work in hive. I think I am missing something here, can someone please help? select a_id from web_data t1 where a_id = (select min(a_id) from web_data t2 where t2.t_timestamp = t1.t_timestamp) I get: FAILED: Parse Error: line 1:69 cannot recognize in

Re: SQL help

2012-05-24 Thread Ashish Thusoo
Hi Mohit, Hive does not support window functions afaik. The following link might be useful if you can bring that in... https://github.com/hbutani/SQLWindowing/wiki Not sure if this is being brought into trunk at some point... Ashish On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia wrote: > I a

SQL help

2012-05-24 Thread Mohit Anchlia
I am new to Hive. I have several SQL from RDBMS database that I need to convert to hive. What's the best reference for HIVEQL? For now I am trying to figure out how to do this in hive: Select distinct A_ID, First_Value(path IGNORE NULLS) over(PARTITION BY A_ID ORDER BY t_timestamp) From WEB_DATA