In addition to other suggestions, you could also take a look at building a Cascading job with a custom Joiner class.
- John On Tue, Mar 24, 2009 at 7:33 AM, Tamir Kamara <tamirkam...@gmail.com> wrote: > Hi, > > We need to implement a Join with a between operator instead of an equal. > What we are trying to do is search a file for a key where the key falls > between two fields in the search file like this: > > main file (ip, a, b): > (80, zz, yy) > (125, vv, bb) > > search file (from-ip, to-ip, d, e): > (52, 75, xxx, yyy) > (78, 98, aaa, bbb) > (99, 115, xxx, ddd) > (125, 130, hhh, aaa) > (150, 162, qqq, sss) > > the outcome should be in the form (ip, a, b, d, e): > (80, zz, yy, aaa, bbb) > (125, vv, bb, eee, hhh) > > We could convert the ip ranges in the search file to single record ips and > then do a regular join, but the number of single ips is huge and this is > probably not a good way. > What would be a good course for doing this in hadoop ? > > > Thanks, > Tamir >