Re: Spark streaming Processing time keeps increasing

2015-07-19 Thread N B
Hi TD, Yay! Thanks for the help. That solved our issue of ever increasing processing time. I added filter functions to all our reduceByKeyAndWindow() operations and now its been stable for over 2 days already! :-). One small feedback about the API though. The one that accepts the filter function

Re: Spark streaming Processing time keeps increasing

2015-07-17 Thread N B
Hi TD, Thanks for the response. I do believe I understand the concept and the need for the filterfunction now. I made the requisite code changes and keeping it running overnight to see the effect of it. Hopefully this should fix our issue. However, there was one place where I encountered a follow

Re: Spark streaming Processing time keeps increasing

2015-07-17 Thread Tathagata Das
Responses inline. On Thu, Jul 16, 2015 at 9:27 PM, N B wrote: > Hi TD, > > Yes, we do have the invertible function provided. However, I am not sure I > understood how to use the filterFunction. Is there an example somewhere > showing its usage? > > The header comment on the function says : > > *

Re: Spark streaming Processing time keeps increasing

2015-07-16 Thread N B
Hi TD, Yes, we do have the invertible function provided. However, I am not sure I understood how to use the filterFunction. Is there an example somewhere showing its usage? The header comment on the function says : * @param filterFunc function to filter expired key-value pairs; *

Re: Spark streaming Processing time keeps increasing

2015-07-16 Thread Tathagata Das
MAke sure you provide the filterFunction with the invertible reduceByKeyAndWindow. Otherwise none of the keys will get removed, and the key space will continue increase. This is what is leading to the lag. So use the filtering function to filter out the keys that are not needed any more. On Thu, J

Re: Spark streaming Processing time keeps increasing

2015-07-16 Thread N B
Thanks Akhil. For doing reduceByKeyAndWindow, one has to have checkpointing enabled. So, yes we do have it enabled. But not Write Ahead Log because we don't have a need for recovery and we do not recover the process state on restart. I don't know if IO Wait fully explains the increasing processing

Re: Spark streaming Processing time keeps increasing

2015-07-16 Thread Akhil Das
What is your data volume? Are you having checkpointing/WAL enabled? In that case make sure you are having SSD disks as this behavior is mainly due to the IO wait. Thanks Best Regards On Thu, Jul 16, 2015 at 8:43 AM, N B wrote: > Hello, > > We have a Spark streaming application and the problem t

Spark streaming Processing time keeps increasing

2015-07-15 Thread N B
Hello, We have a Spark streaming application and the problem that we are encountering is that the batch processing time keeps on increasing and eventually causes the application to start lagging. I am hoping that someone here can point me to any underlying cause of why this might happen. The batc