Cool, we are still sticking with 1.3.1, will upgrade to 1.5 ASAP. Thanks
for the tips, Tathagata!

On Wed, Sep 23, 2015 at 10:40 AM Tathagata Das <t...@databricks.com> wrote:

> A lot of these imbalances were solved in spark 1.5. Could you give that a
> spin?
>
> https://issues.apache.org/jira/browse/SPARK-8882
>
> On Tue, Sep 22, 2015 at 12:17 AM, SLiZn Liu <sliznmail...@gmail.com>
> wrote:
>
>> Hi spark users,
>>
>> In our Spark Streaming app via Kafka integration on Mesos, we initialed 3
>> receivers to receive 3 Kafka partitions, whereas records receiving rate
>> imbalance been observed, with spark.streaming.receiver.maxRate is set to
>> 120, sometimes 1 of which receives very close to the limit while the
>> other two only at roughly fifty per second.
>>
>> This may be caused by previous receiver failure, where one of the
>> receivers’ receiving rate drop to 0. We restarted the Spark Streaming app,
>> and the imbalance began. We suspect that the partition which received by
>> the failing receiver got jammed, and the other two receivers cannot take up
>> its data.
>>
>> The 3-nodes cluster tends to run slowly, nearly all the tasks is
>> registered at the node with previous receiver failure(I used unionto
>> combine 3 receivers’ DStream, thus I expect the combined DStream is well
>> distributed across all nodes), cannot guarantee to finish one batch in a
>> single batch time, stages get piled up, and the digested log shows as
>> following:
>>
>> ...
>> 5728.399: [GC (Allocation Failure) [PSYoungGen: 6954678K->17088K(6961152K)] 
>> 7074614K->138108K(20942336K), 0.0203877 secs] [Times: user=0.20 sys=0.00, 
>> real=0.02 secs]
>>
>> ...
>> 5/09/22 13:33:35 ERROR TaskSchedulerImpl: Ignoring update with state 
>> FINISHED for TID 77219 because its task set is gone (this is likely the 
>> result of
>> receiving duplicate task finished status updates)
>>
>> ...
>>
>> the two type of log was printed in execution of some (not all) stages.
>>
>> My configurations:
>> # of cores on each node: 64
>> # of nodes: 3
>> batch time is set to 10 seconds
>>
>> spark.streaming.receiver.maxRate        120
>> spark.streaming.blockInterval           160  // set to the value that 
>> divides 10 seconds approx. to  total cores, which is 64, to max out all the 
>> nodes: 10s * 1000 / 64
>> spark.storage.memoryFraction            0.1  // this one doesn't seem to 
>> work, since the young gen / old gen ratio is nearly 0.3 instead of 0.1
>>
>> anyone got an idea? Appreciate for your patience.
>>
>> BR,
>> Todd Leo
>> ​
>>
>
>

Reply via email to