You may want to use a bloom filter for this, but make sure that you understand 
how it works

> On 08 Dec 2015, at 09:44, Ramkumar V <ramkumar.c...@gmail.com> wrote:
> 
> Im running spark batch job in cluster mode every hour and it runs for 15 
> minutes. I have certain unique keys in the dataset. i dont want to process 
> those keys during my next hour batch.
> 
> Thanks,
> 
>  
> 
> 
>> On Tue, Dec 8, 2015 at 1:42 PM, Fengdong Yu <fengdo...@everstring.com> wrote:
>> Can you detail your question?  what looks like your previous batch and the 
>> current batch?
>> 
>> 
>> 
>> 
>> 
>>> On Dec 8, 2015, at 3:52 PM, Ramkumar V <ramkumar.c...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm running java over spark in cluster mode. I want to apply filter on 
>>> javaRDD based on some previous batch values. if i store those values in 
>>> mapDB, is it possible to apply filter during the current batch ?
>>> 
>>> Thanks,
>>> 
>>>  
>>> 
> 

Reply via email to