Re: submissionTime vs batchTime, DirectKafka

2016-03-10 Thread Sachin Aggarwal
hi can this be considered a lag in processing of events? should we report this as delay. On Thu, Mar 10, 2016 at 10:51 AM, Mario Ds Briggs wrote: > Look at > org.apache.spark.streaming.scheduler.JobGenerator > > it has a RecurringTimer (timer) that will simply post 'JobGenerate' > events to a E

Re: submissionTime vs batchTime, DirectKafka

2016-03-09 Thread Mario Ds Briggs
Look at org.apache.spark.streaming.scheduler.JobGenerator it has a RecurringTimer (timer) that will simply post 'JobGenerate' events to a EventLoop at the batchInterval time. This EventLoop's thread then picks up these events, uses the streamingContext.graph' to generate a Job (InputDstream's

Re: submissionTime vs batchTime, DirectKafka

2016-03-09 Thread Sachin Aggarwal
Hi cody, let me try once again to explain with example. In BatchInfo class of spark "scheduling delay" is defined as *def schedulingDelay: Option[Long] = processingStartTime.map(_ - submissionTime)* I am dumping batchinfo object in my LatencyListener which extends StreamingListener. batchTime

Re: submissionTime vs batchTime, DirectKafka

2016-03-09 Thread Cody Koeninger
I'm really not sure what you're asking. On Wed, Mar 9, 2016 at 12:43 PM, Sachin Aggarwal wrote: > where are we capturing this delay? > I am aware of scheduling delay which is defined as processing > time-submission time not the batch create time > > On Wed, Mar 9, 2016 at 10:46 PM, Cody Koeninger

Re: submissionTime vs batchTime, DirectKafka

2016-03-09 Thread Sachin Aggarwal
where are we capturing this delay? I am aware of scheduling delay which is defined as processing time-submission time not the batch create time On Wed, Mar 9, 2016 at 10:46 PM, Cody Koeninger wrote: > Spark streaming by default will not start processing a batch until the > current batch is finis

Re: submissionTime vs batchTime, DirectKafka

2016-03-09 Thread Cody Koeninger
Spark streaming by default will not start processing a batch until the current batch is finished. So if your processing time is larger than your batch time, delays will build up. On Wed, Mar 9, 2016 at 11:09 AM, Sachin Aggarwal wrote: > Hi All, > > we have batchTime and submissionTime. > > @para

submissionTime vs batchTime, DirectKafka

2016-03-09 Thread Sachin Aggarwal
Hi All, we have batchTime and submissionTime. @param batchTime Time of the batch @param submissionTime Clock time of when jobs of this batch was submitted to the streaming scheduler queue 1) we are seeing difference between batchTime and submissionTime for small batches(300ms) even in minute