Hello,

I have a non-distributed treatment to apply to a DataSet of timed events, one 
day after another in a flink batch.
My algorithm is:

// wholeSet is too big to fit in RAM with a collect(), so we cut it in pieces
DataSet wholeSet = [Select WholeSet];
for (day 1 to 31) {
                List<> dayData = wholeSet.filter(day).collect();
                applyComplexNonDistributedTreatment(dayData);
}

Even if each day can perfectly fit in RAM (I’ve made a test where only the 
first day have data), I quickly get a OOM in a task manager at one point in the 
loop, so I guess that the “wholeSet” si keeped several times times in Ram.

Two questions :

1)      Is there a better way of handling it where the “select wholeset” is 
made only once ?

2)      Even when the “select wholeset” is made at each iteration, how can I 
completely remove the old set so that I don’t get an OOM ?

Thanks,
Arnaud

________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société 
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company 
that sent this message cannot therefore be held liable for its content nor 
attachments. Any unauthorized use or dissemination is prohibited. If you are 
not the intended recipient of this message, then please delete it and notify 
the sender.

Reply via email to