Souvik,
Let me begin by saying that simplifying the problem goes a long way in
helping us answer your question. You did it really nicely here, so thank
you for doing that.

Why don't you use INNER JOIN instead of LEFT SEMI JOIN? You can express the
same query as INNER JOIN with no restrictions on what can be included in
the SELECT clause. LEFT SEMI JOIN just implements an "exists" like query
efficiently. If you want columns from the right table in your select list,
just do the regular (aka inefficient way) inner join.

Of course, you can optimize inner joins as map joins, sorted merge join or
sorted merge bucketed joins depending on your use case.

Mark

On Thu, Dec 27, 2012 at 3:16 PM, Souvik Banerjee
<souvikbaner...@gmail.com>wrote:

> Hi,
>
> I am struggling with a problem described below.
> Any help how to resolve this problem is highly appreciated.
>
> I have got few tables the structure is over simplified for the sake of
> describing the nature of the problem that I am facing.
>
> *Table 1*
> * *
>
>
>
> ID
>
> Value
>
> 1
>
> V11
>
> 2
>
> V12
>
> 3
>
> V13
>
>
>
> *Table 2*
> * *
>
>
>
> ID
>
> Value
>
> 1
>
> V21
>
> 2
>
> V22
>
> 3
>
> V23
>
>
>
> *Table 3*
> * *
>
>
>
> ID_1
>
> ID_2
>
> Other_Column1
>
> 1
>
> 3
>
> C1
>
> 2
>
> 1
>
> C2
>
> 3
>
> 2
>
> C3
>
>
>
> I want to formulate a table which would look like
>
> ID_1
>
> ID_1_Value
>
> ID_2
>
> ID_2_value
>
> Other_Column1
>
> 1
>
> V11
>
> 3
>
> V23
>
> C1
>
> 2
>
> V12
>
> 1
>
> V21
>
> C2
>
> 3
>
> V13
>
> 2
>
> V22
>
> C3
>
>
>
> I am facing problem with this.
> I tried to LEFT SEMI JOIN in Hive.
>
> I tried to do it in two steps (For 3 tables)
> First step I wanted to do a LEFT SEMI JOIN with TABLE 1 and TABLE 3.
> But the problem is that in LEFT SEMI JOIN you can not have columns from
> the right table in the select clause. So after join my new table simply
> looks like TABLE 3.
>
> Can you help me how I can achieve this is HIVE.
>
> Thanks and regards,
> Souvik.
>

Reply via email to