I thought about your problem over the weekend. Unfortunately the algorithm that you describe does not fit "regular" equi-join semantics, but I think it could be "fitted" with a more complex dataflow.
To achieve that, I would partition the (active) domain of the two datasets on fine-granular intervals (for the sake of the example, let's say 10. You can prepare a "coarse-grained" join key on the inputs using a "x % 10" (Flat)Map: One: (0, {3,6}), (0, {5,7}) Two: (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7) Upon that you can do a regular join on the "coarse-grained" key (in the first component of the tuples), and follow that with a filter that evaluates the actual "one.start <= two.number <= one.end" predicate. Regards, Alex 2015-04-24 20:55 GMT+02:00 Kirschnick, Johannes < johannes.kirschn...@tu-berlin.de>: > Hi > I have a small problem with doing a custom join, that I would need some > help with. Maybe I'm also approaching the problem wrong. > So basically I have two dataset. > The simplified example: The first one has a start and end value. The > second dataset is just a list of ordered numbers and some value (value is > ignored in the example) > Example > One = {3,6},{5,7} > Two = 1,2,3,4,5,6,7 > What I need is a sort of custom join, that brings to the first dataset all > elements from the second that are within the range. > Something like .. join where one.start <= two.number <= one.end > So {3,6} from one would only need to "see" 3,4,5 > Joining does not work out of the box here as the key is sort of "dynamic" > depending on the value of one. > I can just use a map for the first dataset and broadcast the second into > the mapper which can then select the required elements - but my assumption > is that the second dataset might actually be very large as well, but the > qualifying join "numbers" from two will actually be small. > Is there something I could do in this particular case? > Thanks a lot > Johannes >