Are your IPRanges all on nice, even CIDR-format ranges? E.g. 192.168.0.0/16or 10.0.0.0/8?
If the range is always an even subnet mask and not split across subnets, I'd recommend flatMapping the ipToUrl RDD to (IPRange, String) and then joining the two RDDs. The expansion would be at most 32x if all your ranges can be expressed in CIDR notation, and in practice would be much smaller than that (typically you don't need things bigger than a /8 and often not smaller than a /24) Hopefully you can use your knowledge of the ip ranges to make this feasible. Otherwise, you could additionally flatmap the ipRangeToZip out to a list of CIDR notations and do the join then, but you're starting to have the cartesian product work against you on scale at that point. Andrew On Tue, Apr 15, 2014 at 1:07 AM, Roger Hoover <roger.hoo...@gmail.com>wrote: > Hi, > > I'm trying to figure out how to join two RDDs with different key types and > appreciate any suggestions. > > Say I have two RDDS: > ipToUrl of type (IP, String) > ipRangeToZip of type (IPRange, String) > > How can I join/cogroup these two RDDs together to produce a new RDD of > type (IP, (String, String)) where IP is the key and the values are the urls > and zipcodes? > > Say I have a method on the IPRange class called matches(ip: IP), I want > the joined records to match when ipRange.matches(ip). > > Thanks, > > Roger > >