How about:
select
a.pid,
b.pid
tab1 a
left join
tab2 b
on (a.pid=b.pid);
union all -- Need 'union all' to remove dups for the case of when (a.pid=b.pid)
and (substr(a.pid,1,27)=b.pid)
select
a.pid,
b.pid
tab1 a
left join
tab2 b
on (substr(a.pid,1,27)=b.pid);
---
Mark E. Sunder
Hi Bertrand,
Thanks for your quick reply, got it now.
Thanks.
On Fri, Jul 27, 2012 at 12:15 AM, Bertrand Dechoux wrote:
> A join is implemented for most cases with a group by.
>
> Rows in your table a and your table b will be grouped by something let's
> say the value of your colum id.
> So for
A join is implemented for most cases with a group by.
Rows in your table a and your table b will be grouped by something let's
say the value of your colum id.
So for each group doing a join is a trivial operation. The simple way is to
get all values, separate them somehow to know which are from th
Thanks Bertrand,
You said it's hadoop problem, is it means that if I change to use MapReduce
(java MR or streaming), it still can't achieve the purpose?
PS: I'm not very familiar with java MR and streaming:) but I have to find
a way to implement it.
On Thu, Jul 26, 2012 at 11:19 PM, Bertrand De
That's a problem which is hadoop related and not really hive related.
The solution is to use only equal (as you know it). For that, you should
first extract your real identifier for a, which can be a.pid or a part of
it.
I assume that you can know it in advance which one will be used.
Bertrand
O
Hi all,
I have problem when using left join with hive 0.7.1.
I have a query below:
select
a.pid,
b.pid
tab1 a
left join
tab2 b
on (a.pid=b.pid or substr(a.pid,1,27)=b.pid);
But hive don't support "OR" in left join.
Table a is huge, and table b has 4 rows now(will increase).
Is there