Hello Tamir ,

I think the better and simple way of doing this through Pig. 

http://wiki.apache.org/pig/PigOverview 

As Pig provides SQL type of interface over Hadoop  and support the kind
of operation you need to do with data quite easily.


Thanks ,

---
Peeyush

On Tue, 2009-03-24 at 13:33 +0200, Tamir Kamara wrote:

> Hi,
> 
> We need to implement a Join with a between operator instead of an equal.
> What we are trying to do is search a file for a key where the key falls
> between two fields in the search file like this:
> 
> main file (ip, a, b):
> (80, zz, yy)
> (125, vv, bb)
> 
> search file (from-ip, to-ip, d, e):
> (52, 75, xxx, yyy)
> (78, 98, aaa, bbb)
> (99, 115, xxx, ddd)
> (125, 130, hhh, aaa)
> (150, 162, qqq, sss)
> 
> the outcome should be in the form (ip, a, b, d, e):
> (80, zz, yy, aaa, bbb)
> (125, vv, bb, eee, hhh)
> 
> We could convert the ip ranges in the search file to single record ips and
> then do a regular join, but the number of single ips is huge and this is
> probably not a good way.
> What would be a good course for doing this in hadoop ?
> 
> 
> Thanks,
> Tamir

Reply via email to