Hello,

I was wondering how to join large data sets on inequalities.

Let say I have a data set whose “keys” are two timestamps (start time & end 
time of validity) and value is a label :
        final DataSet<Tuple3<Long, Long, String>> historical = …;

I also have events, with an event name and a timestamp :
        final DataSet<Tuple2<String, Long>> events = …;

I want to join my events with my historical data to get the “active” label for 
the time of the event.
The simple way is to use a cross product + a filter :

events.cross(historical).filter((crossedRow) -> {
            return (crossedRow.f0.f1 >= crossedRow.f1.f0) && (crossedRow.f0.f1 
<= crossedRow.f1.f1);
        })

But that’s not efficient with 2 big data sets…

How would you code that ?

Greetings,
Arnaud





________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company 
that sent this message cannot therefore be held liable for its content nor 
attachments. Any unauthorized use or dissemination is prohibited. If you are 
not the intended recipient of this message, then please delete it and notify 
the sender.

Reply via email to