I see! As I said in the other thread, no one reported these issues until
now! A good and not-too-hard fix is to add the functionality of the
limiting the data rate that the receivers receives at. I have opened a
JIRA.

TD


On Thu, Mar 27, 2014 at 3:28 PM, Evgeny Shishkin <itparan...@gmail.com>wrote:

>
> On 28 Mar 2014, at 01:13, Tathagata Das <tathagata.das1...@gmail.com>
> wrote:
>
> Seems like the configuration of the Spark worker is not right. Either the
> worker has not been given enough memory or the allocation of the memory to
> the RDD storage needs to be fixed. If configured correctly, the Spark
> workers should not get OOMs.
>
>
>
> Yes, it is easy to start with latest offsets, get steady configuration and
> everything is nice.
>
> Then your machine failes. And you stop receiving from kafka anything.
>
> Then you notice this and restart your app hoping it would continue from
> offsets on zookeeper.
> BUT NO
> YOUR DEFAULT STREAM CONSUMERS JUST ERASED OFFSETS FROM ZOOKEEPER
>
> After we fixed erasing offsets, we start from Some Offsets in the past.
> And during batch duration we can't limit how many messages we get from
> Kafka.
> AND HERE WE OOM
>
> And it's just a pain. Complete pain.
>
> And you remember, only some machines consumes. Usually two or three.
> Because of broken high-level consumer in kafka.
>

Reply via email to