One more question: while processing the exact same batch I noticed that
giving more CPUs to the worker does not decrease the duration of the batch.
I tried this with 4 and 8 CPUs. Though, I noticed that giving only 1 CPU
the duration increased, but apart from that the values were pretty similar,
whether I was using 4 or 6 or 8 CPUs.

On Thu, Feb 26, 2015 at 5:35 PM, Saiph Kappa <saiph.ka...@gmail.com> wrote:

> By setting spark.eventLog.enabled to true it is possible to see the
> application UI after the application has finished its execution, however
> the Streaming tab is no longer visible.
>
> For measuring the duration of batches in the code I am doing something
> like this:
> «wordCharValues.foreachRDD(rdd => {
>             val startTick = System.currentTimeMillis()
>             val result = rdd.take(1)
>             val timeDiff = System.currentTimeMillis() - startTick»
>
> But my quesiton is: is it possible to see the rate/throughput
> (records/sec) when I have a stream to process log files that appear in a
> folder?
>
>
>
> On Thu, Feb 26, 2015 at 1:36 AM, Tathagata Das <t...@databricks.com>
> wrote:
>
>> Yes. # tuples processed in a batch = sum of all the tuples received by
>> all the receivers.
>>
>> In screen shot, there was a batch with 69.9K records, and there was a
>> batch which took 1 s 473 ms. These two batches can be the same, can be
>> different batches.
>>
>> TD
>>
>> On Wed, Feb 25, 2015 at 10:11 AM, Josh J <joshjd...@gmail.com> wrote:
>>
>>> If I'm using the kafka receiver, can I assume the number of records
>>> processed in the batch is the sum of the number of records processed by the
>>> kafka receiver?
>>>
>>> So in the screen shot attached the max rate of tuples processed in a
>>> batch is 42.7K + 27.2K = 69.9K tuples processed in a batch with a max
>>> processing time of 1 second 473 ms?
>>>
>>> On Wed, Feb 25, 2015 at 8:48 AM, Akhil Das <ak...@sigmoidanalytics.com>
>>> wrote:
>>>
>>>> By throughput you mean Number of events processed etc?
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> Streaming tab already have these statistics.
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Wed, Feb 25, 2015 at 9:59 PM, Josh J <joshjd...@gmail.com> wrote:
>>>>
>>>>>
>>>>> On Wed, Feb 25, 2015 at 7:54 AM, Akhil Das <ak...@sigmoidanalytics.com
>>>>> > wrote:
>>>>>
>>>>>> For SparkStreaming applications, there is already a tab called
>>>>>> "Streaming" which displays the basic statistics.
>>>>>
>>>>>
>>>>> Would I just need to extend this tab to add the throughput?
>>>>>
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>
>>
>

Reply via email to