my query had a bug in it ... it had on ON (it was repeated twice). try this
SELECT h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, c.clicks, 0) FROM selection_daily_hits h left outer join selection_daily_uniques u ON (h.sel_sid = u.sel_sid AND h.date_day = u.date_day AND u.date_day = '20110211') left outer join selection_daily_clicks c ON (h.sel_sid = c.sel_sid AND h.date_day = c.date_day AND c.date_day = '20110211') WHERE h.date_day = '20110211'; On Wed, Feb 23, 2011 at 10:51 PM, Cam Bazz <camb...@gmail.com> wrote: > Hello, > > Here are the table descriptions. they only have the identifier, hits, > unqiques and date_day which is the partition > > hive> describe selection_daily_hits; > OK > sel_sid int > hits int > date_day string > > hive> describe selection_daily_uniques; > OK > sel_sid int > uniques int > date_day string > > hive> describe selection_daily_clicks; > OK > sel_sid int > clicks int > date_day string > > i tried to query you have, which unfortunately returns: > > FAILED: Parse Error: line 1:272 cannot recognize input 'ON' in > expression specification > > In the query I wrote, I thought the on clause covered all three joins, > but I from your query how i should do it from your query, except, it > will return parse error...? but why? > > best regards, > c,b, > > > On Wed, Feb 23, 2011 at 10:31 PM, Viral Bajaria <viral.baja...@gmail.com> > wrote: > > > > I have a few questions as follows: > > 1) what's the schema of all 3 tables ? Do these tables only have > (sel_sid, > > date_day) as the columns along with the facts that they represent or do > they > > have more columns besides those 2 columns ? > > 2) why do you do a left outer join without an ON clause, not too sure if > > MySql syntax is like that but I normally don't prefer to do a JOIN > without > > specifying anything in the ON clause. I just don't trust the result set > and > > the query is not really readable. > > How about trying this query: > > SELECT > > h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, c.clicks, 0) > > FROM > > selection_daily_hits h > > left outer join selection_daily_uniques u ON (h.sel_sid = u.sel_sid > AND > > h.date_day = u.date_day AND u.date_day = '20110211') > > left outer join selection_daily_clicks c on ON (h.sel_sid = c.sel_sid > AND > > h.date_day = c.date_day AND c.date_day = '20110211') > > WHERE h.date_day = '20110211'; > > some notes about the query: > > - I restrict the uniques and clicks to 20110211 in the JOIN clause > because > > in hive 0.5.0 if you put them in the WHERE clause the partitions don't > get > > trimmed and it scans the entire table before limiting the data. If your > data > > is not partitioned please go ahead and remove that restriction. > > - I join on the date_day columns to make sure the data is correct if the > > tables are not partitioned or the query plan causes table scans because > > there are chances you can see the same sel_sid on different days (this is > an > > assumption) > > -Viral > > On Wed, Feb 23, 2011 at 3:16 AM, Cam Bazz <camb...@gmail.com> wrote: > >> > >> Hello, > >> > >> I have three tables, one that counts hits, the other unique visits, > >> and the other clicks on that page: > >> > >> The query below will fail to produce correct results: (number of > >> uniques is wrong, always set to 8, same number for all) > >> > >> select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, > >> c.clicks, 0) from selection_daily_hits h left outer join > >> selection_daily_uniques u left outer join selection_daily_clicks c on > >> (h.sel_sid = u.sel_sid and h.sel_sid = c.sel_sid and h.date_day = > >> '20110211' and u.date_day = '20110211' and c.date_day = '20110211'); > >> > >> where the query below will work and provide correct results > >> > >> select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, > >> c.clicks, 0) from selection_daily_hits h left outer join > >> selection_daily_uniques u left outer join selection_daily_clicks c on > >> (h.sel_sid = u.sel_sid and c.sel_sid = h.sel_sid and h.date_day = > >> '20110211' and u.date_day = '20110211' and c.date_day = '20110211'); > >> > >> the only difference is, on the non working query I have h.sel_sid = > >> c.sel_sid and in the working query I have c.sel_sid = h.sel_sid > >> > >> notice that while the first and second table will always have the same > >> number keys, the third table might not have some keys, hence those > >> lines are converted to 0. > >> > >> Best Regards, > >> -C.B. > > > > >