Hi,

(what a timing. Just reviewed CC yesterday!)

In ALS they trigger cleaning up shufflemapstages themselves so if I
understood the issue the streaming part could do it too.

Jacek

On 19 Dec 2016 11:35 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
wrote:

> Hey Prashant. Thanks for your codes. I did some investigation and it
> turned out that ContextCleaner is too slow and its "referenceQueue" keeps
> growing. My hunch is cleaning broadcast is very slow since it's a blocking
> call.
>
> On Mon, Dec 19, 2016 at 12:50 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Hey, Prashant. Could you track the GC root of byte arrays in the heap?
>>
>> On Sat, Dec 17, 2016 at 10:04 PM, Prashant Sharma <scrapco...@gmail.com>
>> wrote:
>>
>>> Furthermore, I ran the same thing with 26 GB as the memory, which would
>>> mean 1.3GB per thread of memory. My jmap
>>> <https://github.com/ScrapCodes/KafkaProducer/blob/master/data/26GB/t11_jmap-histo>
>>> results and jstat
>>> <https://github.com/ScrapCodes/KafkaProducer/blob/master/data/26GB/t11_jstat>
>>> results collected after running the job for more than 11h, again show a
>>> memory constraint. The same gradual slowdown, but a bit more gradual as
>>> memory is considerably more than the previous runs.
>>>
>>>
>>>
>>>
>>> This situation sounds like a memory leak ? As the byte array objects are
>>> more than 13GB, and are not garbage collected.
>>>
>>> --Prashant
>>>
>>>
>>> On Sun, Dec 18, 2016 at 8:49 AM, Prashant Sharma <scrapco...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Goal of my benchmark is to arrive at end to end latency lower than
>>>> 100ms and sustain them over time, by consuming from a kafka topic and
>>>> writing back to another kafka topic using Spark. Since the job does not do
>>>> aggregation and does a constant time processing on each message, it
>>>> appeared to me as an achievable target. But, then there are some surprising
>>>> and interesting pattern to observe.
>>>>
>>>>  Basically, it has four components namely,
>>>> 1) kafka
>>>> 2) Long running kafka producer, rate limited to 1000 msgs/sec, with
>>>> each message of about 1KB.
>>>> 3) Spark  job subscribed to `test` topic and writes out to another
>>>> topic `output`.
>>>> 4) A Kafka consumer, reading from the `output` topic.
>>>>
>>>> How the latency was measured ?
>>>>
>>>> While sending messages from kafka producer, each message is embedded
>>>> the timestamp at which it is pushed to the kafka `test` topic. Spark
>>>> receives each message and writes them out to `output` topic as is. When
>>>> these messages arrive at Kafka consumer, their embedded time is subtracted
>>>> from the time of arrival at the consumer and a scatter plot of the same is
>>>> attached.
>>>>
>>>> The scatter plots sample only 10 minutes of data received during
>>>> initial one hour and then again 10 minutes of data received after 2 hours
>>>> of run.
>>>>
>>>>
>>>>
>>>> These plots indicate a significant slowdown in latency, in the later
>>>> scatter plot indicate almost all the messages were received with a delay
>>>> larger than 2 seconds. However, first plot show that most messages arrived
>>>> in less than 100ms latency. The two samples were taken with time difference
>>>> of 2 hours approx.
>>>>
>>>> After running the test for 24 hours, the jstat
>>>> <https://raw.githubusercontent.com/ScrapCodes/KafkaProducer/master/data/jstat_output.txt>
>>>> and jmap
>>>> <https://raw.githubusercontent.com/ScrapCodes/KafkaProducer/master/data/jmap_output.txt>
>>>>  output
>>>> for the jobs indicate possibility  of memory constrains. To be more clear,
>>>> job was run with local[20] and memory of 5GB(spark.driver.memory). The job
>>>> is straight forward and located here: https://github.com/ScrapCodes/
>>>> KafkaProducer/blob/master/src/main/scala/com/github/scrapcod
>>>> es/kafka/SparkSQLKafkaConsumer.scala .
>>>>
>>>>
>>>> What is causing the gradual slowdown? I need help in diagnosing the
>>>> problem.
>>>>
>>>> Thanks,
>>>>
>>>> --Prashant
>>>>
>>>>
>>>
>>
>

Reply via email to