my query had a bug in it ... it had on ON (it was repeated twice). try this

SELECT
   h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, c.clicks, 0)
FROM
   selection_daily_hits h
   left outer join selection_daily_uniques u ON (h.sel_sid = u.sel_sid AND
h.date_day = u.date_day AND u.date_day = '20110211')
   left outer join selection_daily_clicks c ON (h.sel_sid = c.sel_sid AND
h.date_day = c.date_day AND c.date_day = '20110211')
WHERE h.date_day = '20110211';

On Wed, Feb 23, 2011 at 10:51 PM, Cam Bazz <camb...@gmail.com> wrote:

> Hello,
>
> Here are the table descriptions. they only have the identifier, hits,
> unqiques and date_day which is the partition
>
> hive> describe selection_daily_hits;
> OK
> sel_sid int
> hits    int
> date_day        string
>
> hive> describe selection_daily_uniques;
> OK
> sel_sid int
> uniques int
> date_day        string
>
> hive> describe selection_daily_clicks;
> OK
> sel_sid int
> clicks  int
> date_day        string
>
> i tried to query you have, which unfortunately returns:
>
> FAILED: Parse Error: line 1:272 cannot recognize input 'ON' in
> expression specification
>
> In the query I wrote, I thought the on clause covered all three joins,
> but I from your query how i should do it from your query, except, it
> will return parse error...? but why?
>
> best regards,
> c,b,
>
>
> On Wed, Feb 23, 2011 at 10:31 PM, Viral Bajaria <viral.baja...@gmail.com>
> wrote:
> >
> > I have a few questions as follows:
> > 1) what's the schema of all 3 tables ? Do these tables only have
>  (sel_sid,
> > date_day) as the columns along with the facts that they represent or do
> they
> > have more columns besides those 2 columns ?
> > 2) why do you do a left outer join without an ON clause, not too sure if
> > MySql syntax is like that but I normally don't prefer to do a JOIN
> without
> > specifying anything in the ON clause. I just don't trust the result set
> and
> > the query is not really readable.
> > How about trying this query:
> > SELECT
> >    h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, c.clicks, 0)
> > FROM
> >    selection_daily_hits h
> >    left outer join selection_daily_uniques u ON (h.sel_sid = u.sel_sid
> AND
> > h.date_day = u.date_day AND u.date_day = '20110211')
> >    left outer join selection_daily_clicks c on ON (h.sel_sid = c.sel_sid
> AND
> > h.date_day = c.date_day AND c.date_day = '20110211')
> > WHERE h.date_day = '20110211';
> > some notes about the query:
> > - I restrict the uniques and clicks to 20110211 in the JOIN clause
> because
> > in hive 0.5.0 if you put them in the WHERE clause the partitions don't
> get
> > trimmed and it scans the entire table before limiting the data. If your
> data
> > is not partitioned please go ahead and remove that restriction.
> > - I join on the date_day columns to make sure the data is correct if the
> > tables are not partitioned or the query plan causes table scans because
> > there are chances you can see the same sel_sid on different days (this is
> an
> > assumption)
> > -Viral
> > On Wed, Feb 23, 2011 at 3:16 AM, Cam Bazz <camb...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I have three tables, one that counts hits, the other unique visits,
> >> and the other clicks on that page:
> >>
> >> The query below will fail to produce correct results: (number of
> >> uniques is wrong, always set to 8, same number for all)
> >>
> >> select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null,
> >> c.clicks, 0) from selection_daily_hits h left outer join
> >> selection_daily_uniques u left outer join selection_daily_clicks c on
> >> (h.sel_sid = u.sel_sid and h.sel_sid = c.sel_sid and h.date_day =
> >> '20110211' and u.date_day = '20110211' and c.date_day = '20110211');
> >>
> >> where the query below will work and provide correct results
> >>
> >> select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null,
> >> c.clicks, 0) from selection_daily_hits h left outer join
> >> selection_daily_uniques u left outer join selection_daily_clicks c on
> >> (h.sel_sid = u.sel_sid and c.sel_sid = h.sel_sid and h.date_day =
> >> '20110211' and u.date_day = '20110211' and c.date_day = '20110211');
> >>
> >> the only difference is, on the non working query I have h.sel_sid =
> >> c.sel_sid and in the working query I have c.sel_sid = h.sel_sid
> >>
> >> notice that while the first and second table will always have the same
> >> number keys, the third table might not have some keys, hence those
> >> lines are converted to 0.
> >>
> >> Best Regards,
> >> -C.B.
> >
> >
>

Reply via email to