hi can this be considered a lag in processing of events? should we report this as delay.
On Thu, Mar 10, 2016 at 10:51 AM, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote: > Look at > org.apache.spark.streaming.scheduler.JobGenerator > > it has a RecurringTimer (timer) that will simply post 'JobGenerate' > events to a EventLoop at the batchInterval time. > > This EventLoop's thread then picks up these events, uses the > streamingContext.graph' to generate a Job (InputDstream's compute method). > batchInfo.submissionTime is the time recorded after this generateJob > completes. The Job is then sent to the or > g.apache.spark.streaming.scheduler.JobScheduler who has a > ThreadExecutorPool to execute the Job. > > JobGenerate events are not the only event that gets posted to the > JobGenerator.eventLoop. Other events are like DoCheckpoint, > ClearCheckpointData, ClearMetadata are also posted and all these events > are serviced by the EventLoop's single thread. So for instance if a > DoCheckPoint, ClearCheckpointData and ClearMetadata events are queued > before your nth JobGenerate event, then there will be a time difference > between the batchTime and SubmissionTime for that nth batch > > > thanks > Mario > > > > > > > On Thu, Mar 10, 2016 at 10:29 AM, Sachin Aggarwal < > *different.sac...@gmail.com* <different.sac...@gmail.com>> wrote: > > Hi cody, > > let me try once again to explain with example. > > In BatchInfo class of spark "scheduling delay" is defined as > > *def **schedulingDelay: Option[Long] = processingStartTime.map(_ - > submissionTime)* > > I am dumping batchinfo object in my LatencyListener which > extends StreamingListener. > batchTime = 1457424695400 ms > submissionTime = 1457425630780 ms > difference = 935380 ms > > can this be considered a lag in processing of events . what is > possible explaination for this lag? > > On Thu, Mar 10, 2016 at 12:22 AM, Cody Koeninger <*c...@koeninger.org* > <c...@koeninger.org>> wrote: > I'm really not sure what you're asking. > > On Wed, Mar 9, 2016 at 12:43 PM, Sachin Aggarwal > <*different.sac...@gmail.com* <different.sac...@gmail.com>> wrote: > > where are we capturing this delay? > > I am aware of scheduling delay which is defined as processing > > time-submission time not the batch create time > > > > On Wed, Mar 9, 2016 at 10:46 PM, Cody Koeninger <*c...@koeninger.org* > <c...@koeninger.org>> wrote: > >> > >> Spark streaming by default will not start processing a batch until > the > >> current batch is finished. So if your processing time is larger > than > >> your batch time, delays will build up. > >> > >> On Wed, Mar 9, 2016 at 11:09 AM, Sachin Aggarwal > >> <*different.sac...@gmail.com* <different.sac...@gmail.com>> wrote: > >> > Hi All, > >> > > >> > we have batchTime and submissionTime. > >> > > >> > @param batchTime Time of the batch > >> > > >> > @param submissionTime Clock time of when jobs of this batch was > >> > submitted > >> > to the streaming scheduler queue > >> > > >> > 1) we are seeing difference between batchTime and submissionTime > for > >> > small > >> > batches(300ms) even in minutes for direct kafka this we see, only > when > >> > the > >> > processing time is more than the batch interval. how can we > explain this > >> > delay?? > >> > > >> > 2) In one of case batch processing time is more then batch > interval, > >> > then > >> > will spark fetch the next batch data from kafka parallelly > processing > >> > the > >> > current batch or it will wait for current batch to finish first ? > >> > > >> > I would be thankful if you give me some pointers > >> > > >> > Thanks! > >> > -- > >> > > >> > Thanks & Regards > >> > > >> > Sachin Aggarwal > >> > *7760502772* <7760502772> > > > > > > > > > > -- > > > > Thanks & Regards > > > > Sachin Aggarwal > > *7760502772* <7760502772> > > > > -- > > Thanks & Regards > > Sachin Aggarwal > *7760502772* <7760502772> > > > > > -- > > Thanks & Regards > > Sachin Aggarwal > 7760502772 > > > -- Thanks & Regards Sachin Aggarwal 7760502772