So the system has gone from 7msg in 4.961 secs (median) to 106msgs in 4,761 seconds. I think there's evidence that setup costs are quite high in this case and increasing the batch interval is helping.
On Thu, Jan 22, 2015 at 4:12 PM, Sudipta Banerjee < asudipta.baner...@gmail.com> wrote: > Hi Ashic Mahtab, > > The Cassandra and the Zookeeper are they installed as a part of Yarn > architecture or are they installed in a separate layer with Apache Spark . > > Thanks and Regards, > Sudipta > > On Thu, Jan 22, 2015 at 8:13 PM, Ashic Mahtab <as...@live.com> wrote: > >> Hi Guys, >> So I changed the interval to 15 seconds. There's obviously a lot more >> messages per batch, but (I think) it looks a lot healthier. Can you see any >> major warning signs? I think that with 2 second intervals, the setup / >> teardown per partition was what was causing the delays. >> >> Streaming >> >> - *Started at: *Thu Jan 22 13:23:12 GMT 2015 >> - *Time since start: *1 hour 17 minutes 16 seconds >> - *Network receivers: *2 >> - *Batch interval: *15 seconds >> - *Processed batches: *309 >> - *Waiting batches: *0 >> >> >> >> Statistics over last 100 processed batchesReceiver Statistics >> >> - Receiver >> >> >> - Status >> >> >> - Location >> >> >> - Records in last batch >> - [2015/01/22 14:40:29] >> >> >> - Minimum rate >> - [records/sec] >> >> >> - Median rate >> - [records/sec] >> >> >> - Maximum rate >> - [records/sec] >> >> >> - Last Error >> >> RmqReceiver-0ACTIVEVDCAPP53.foo.local2.6 K29106295-RmqReceiver-1ACTIVE >> VDCAPP50.bar.local2.6 K29107291- >> Batch Processing Statistics >> >> MetricLast batchMinimum25th percentileMedian75th >> percentileMaximumProcessing >> Time4 seconds 812 ms4 seconds 698 ms4 seconds 738 ms4 seconds 761 ms4 >> seconds 788 ms5 seconds 802 msScheduling Delay2 ms0 ms3 ms3 ms4 ms9 >> msTotal >> Delay4 seconds 814 ms4 seconds 701 ms4 seconds 739 ms4 seconds 764 ms4 >> seconds 792 ms5 seconds 809 ms >> >> >> Regards, >> Ashic. >> ------------------------------ >> From: as...@live.com >> To: gerard.m...@gmail.com >> CC: user@spark.apache.org >> Subject: RE: Are these numbers abnormal for spark streaming? >> Date: Thu, 22 Jan 2015 12:32:05 +0000 >> >> >> Hi Gerard, >> Thanks for the response. >> >> The messages get desrialised from msgpack format, and one of the strings >> is desrialised to json. Certain fields are checked to decide if further >> processing is required. If so, it goes through a series of in mem filters >> to check if more processing is required. If so, only then does the "heavy" >> work start. That consists of a few db queries, and potential updates to the >> db + message on message queue. The majority of messages don't need >> processing. The messages needing processing at peak are about three every >> other second. >> >> One possible things that might be happening is the session initialisation >> and prepared statement initialisation for each partition. I can resort to >> some tricks, but I think I'll try increasing batch interval to 15 seconds. >> I'll report back with findings. >> >> Thanks, >> Ashic. >> >> ------------------------------ >> From: gerard.m...@gmail.com >> Date: Thu, 22 Jan 2015 12:30:08 +0100 >> Subject: Re: Are these numbers abnormal for spark streaming? >> To: tathagata.das1...@gmail.com >> CC: as...@live.com; t...@databricks.com; user@spark.apache.org >> >> and post the code (if possible). >> In a nutshell, your processing time > batch interval, resulting in an >> ever-increasing delay that will end up in a crash. >> 3 secs to process 14 messages looks like a lot. Curious what the job >> logic is. >> >> -kr, Gerard. >> >> On Thu, Jan 22, 2015 at 12:15 PM, Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >> This is not normal. Its a huge scheduling delay!! Can you tell me more >> about the application? >> - cluser setup, number of receivers, whats the computation, etc. >> >> On Thu, Jan 22, 2015 at 3:11 AM, Ashic Mahtab <as...@live.com> wrote: >> >> Hate to do this...but...erm...bump? Would really appreciate input from >> others using Streaming. Or at least some docs that would tell me if these >> are expected or not. >> >> ------------------------------ >> From: as...@live.com >> To: user@spark.apache.org >> Subject: Are these numbers abnormal for spark streaming? >> Date: Wed, 21 Jan 2015 11:26:31 +0000 >> >> >> Hi Guys, >> I've got Spark Streaming set up for a low data rate system (using spark's >> features for analysis, rather than high throughput). Messages are coming in >> throughout the day, at around 1-20 per second (finger in the air >> estimate...not analysed yet). In the spark streaming UI for the >> application, I'm getting the following after 17 hours. >> >> Streaming >> >> - *Started at: *Tue Jan 20 16:58:43 GMT 2015 >> - *Time since start: *18 hours 24 minutes 34 seconds >> - *Network receivers: *2 >> - *Batch interval: *2 seconds >> - *Processed batches: *16482 >> - *Waiting batches: *1 >> >> >> >> Statistics over last 100 processed batchesReceiver Statistics >> >> - Receiver >> >> >> - Status >> >> >> - Location >> >> >> - Records in last batch >> - [2015/01/21 11:23:18] >> >> >> - Minimum rate >> - [records/sec] >> >> >> - Median rate >> - [records/sec] >> >> >> - Maximum rate >> - [records/sec] >> >> >> - Last Error >> >> RmqReceiver-0ACTIVEFOOOO >> 144727-RmqReceiver-1ACTIVEBAAAAR >> 124726- >> Batch Processing Statistics >> >> MetricLast batchMinimum25th percentileMedian75th >> percentileMaximumProcessing >> Time3 seconds 994 ms157 ms4 seconds 16 ms4 seconds 961 ms5 seconds 3 >> ms5 seconds 171 msScheduling Delay9 hours 15 minutes 4 seconds9 hours >> 10 minutes 54 seconds9 hours 11 minutes 56 seconds9 hours 12 minutes >> 57 seconds9 hours 14 minutes 5 seconds9 hours 15 minutes 4 secondsTotal >> Delay9 hours 15 minutes 8 seconds9 hours 10 minutes 58 seconds9 hours >> 12 minutes9 hours 13 minutes 2 seconds9 hours 14 minutes 10 seconds9 >> hours 15 minutes 8 seconds >> >> >> Are these "normal". I was wondering what the scheduling delay and total >> delay terms are, and if it's normal for them to be 9 hours. >> >> I've got a standalone spark master and 4 spark nodes. The streaming app >> has been given 4 cores, and it's using 1 core per worker node. The >> streaming app is submitted from a 5th machine, and that machine has nothing >> but the driver running. The worker nodes are running alongside Cassandra >> (and reading and writing to it). >> >> Any insights would be appreciated. >> >> Regards, >> Ashic. >> >> >> >> > > > -- > Sudipta Banerjee > Consultant, Business Analytics and Cloud Based Architecture > Call me +919019578099 >