You may want to use a bloom filter for this, but make sure that you understand how it works
> On 08 Dec 2015, at 09:44, Ramkumar V <ramkumar.c...@gmail.com> wrote: > > Im running spark batch job in cluster mode every hour and it runs for 15 > minutes. I have certain unique keys in the dataset. i dont want to process > those keys during my next hour batch. > > Thanks, > > > > >> On Tue, Dec 8, 2015 at 1:42 PM, Fengdong Yu <fengdo...@everstring.com> wrote: >> Can you detail your question? what looks like your previous batch and the >> current batch? >> >> >> >> >> >>> On Dec 8, 2015, at 3:52 PM, Ramkumar V <ramkumar.c...@gmail.com> wrote: >>> >>> Hi, >>> >>> I'm running java over spark in cluster mode. I want to apply filter on >>> javaRDD based on some previous batch values. if i store those values in >>> mapDB, is it possible to apply filter during the current batch ? >>> >>> Thanks, >>> >>> >>> >