One more question: while processing the exact same batch I noticed that giving more CPUs to the worker does not decrease the duration of the batch. I tried this with 4 and 8 CPUs. Though, I noticed that giving only 1 CPU the duration increased, but apart from that the values were pretty similar, whether I was using 4 or 6 or 8 CPUs.
On Thu, Feb 26, 2015 at 5:35 PM, Saiph Kappa <saiph.ka...@gmail.com> wrote: > By setting spark.eventLog.enabled to true it is possible to see the > application UI after the application has finished its execution, however > the Streaming tab is no longer visible. > > For measuring the duration of batches in the code I am doing something > like this: > «wordCharValues.foreachRDD(rdd => { > val startTick = System.currentTimeMillis() > val result = rdd.take(1) > val timeDiff = System.currentTimeMillis() - startTick» > > But my quesiton is: is it possible to see the rate/throughput > (records/sec) when I have a stream to process log files that appear in a > folder? > > > > On Thu, Feb 26, 2015 at 1:36 AM, Tathagata Das <t...@databricks.com> > wrote: > >> Yes. # tuples processed in a batch = sum of all the tuples received by >> all the receivers. >> >> In screen shot, there was a batch with 69.9K records, and there was a >> batch which took 1 s 473 ms. These two batches can be the same, can be >> different batches. >> >> TD >> >> On Wed, Feb 25, 2015 at 10:11 AM, Josh J <joshjd...@gmail.com> wrote: >> >>> If I'm using the kafka receiver, can I assume the number of records >>> processed in the batch is the sum of the number of records processed by the >>> kafka receiver? >>> >>> So in the screen shot attached the max rate of tuples processed in a >>> batch is 42.7K + 27.2K = 69.9K tuples processed in a batch with a max >>> processing time of 1 second 473 ms? >>> >>> On Wed, Feb 25, 2015 at 8:48 AM, Akhil Das <ak...@sigmoidanalytics.com> >>> wrote: >>> >>>> By throughput you mean Number of events processed etc? >>>> >>>> [image: Inline image 1] >>>> >>>> Streaming tab already have these statistics. >>>> >>>> >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Wed, Feb 25, 2015 at 9:59 PM, Josh J <joshjd...@gmail.com> wrote: >>>> >>>>> >>>>> On Wed, Feb 25, 2015 at 7:54 AM, Akhil Das <ak...@sigmoidanalytics.com >>>>> > wrote: >>>>> >>>>>> For SparkStreaming applications, there is already a tab called >>>>>> "Streaming" which displays the basic statistics. >>>>> >>>>> >>>>> Would I just need to extend this tab to add the throughput? >>>>> >>>> >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >> >> >