Hi,

thanks for dedicating some of your weekend time to this problem.
Your solution looks quite neat, thanks ..  

For your information, what I'm trying to do is multiply matrix blocks (dataset 
one) with individual elements of a vector (dataset two, of row id and some 
value).
In this case "one" needs to see all the qualifying entries from "two" to 
correctly multiply, ideally as a group for streamlined multiplication.

Johannes

-----Ursprüngliche Nachricht-----
Von: Alexander Alexandrov [mailto:alexander.s.alexand...@gmail.com] 
Gesendet: Sonntag, 26. April 2015 23:22
An: dev@flink.apache.org
Betreff: Re: Join with a custom predicate

I thought about your problem over the weekend. Unfortunately the algorithm that 
you describe does not fit "regular" equi-join semantics, but I think it could 
be "fitted" with a more complex dataflow.

To achieve that, I would partition the (active) domain of the two datasets on 
fine-granular intervals (for the sake of the example, let's say 10.

You can prepare a "coarse-grained" join key on the inputs using a "x % 10"
(Flat)Map:

One: (0, {3,6}), (0, {5,7})
Two: (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7)

Upon that you can do a regular join on the "coarse-grained" key (in the first 
component of the tuples), and follow that with a filter that evaluates the 
actual "one.start <= two.number <= one.end" predicate.

Regards,
Alex


2015-04-24 20:55 GMT+02:00 Kirschnick, Johannes <
johannes.kirschn...@tu-berlin.de>:

> Hi
> I have a small problem with doing a custom join, that I would need 
> some help with. Maybe I'm also approaching the problem wrong.
> So basically I have two dataset.
> The simplified example: The first one has a start and end value. The 
> second dataset is just a list of ordered numbers and some value (value 
> is ignored in the example) Example One = {3,6},{5,7} Two = 
> 1,2,3,4,5,6,7 What I need is a sort of custom join, that brings to the 
> first dataset all elements from the second that are within the range.
> Something like .. join where one.start <= two.number <= one.end So 
> {3,6} from one would only need to "see" 3,4,5 Joining does not work 
> out of the box here as the key is sort of "dynamic"
> depending on the value of one.
> I can just use a map for the first dataset and broadcast the second 
> into the mapper which can then select the required elements - but my 
> assumption is that the second dataset might actually be very large as 
> well, but the qualifying join "numbers" from two will actually be small.
> Is there something I could do in this particular case?
> Thanks a lot
> Johannes
>

Reply via email to