I see! As I said in the other thread, no one reported these issues until now! A good and not-too-hard fix is to add the functionality of the limiting the data rate that the receivers receives at. I have opened a JIRA.
TD On Thu, Mar 27, 2014 at 3:28 PM, Evgeny Shishkin <itparan...@gmail.com>wrote: > > On 28 Mar 2014, at 01:13, Tathagata Das <tathagata.das1...@gmail.com> > wrote: > > Seems like the configuration of the Spark worker is not right. Either the > worker has not been given enough memory or the allocation of the memory to > the RDD storage needs to be fixed. If configured correctly, the Spark > workers should not get OOMs. > > > > Yes, it is easy to start with latest offsets, get steady configuration and > everything is nice. > > Then your machine failes. And you stop receiving from kafka anything. > > Then you notice this and restart your app hoping it would continue from > offsets on zookeeper. > BUT NO > YOUR DEFAULT STREAM CONSUMERS JUST ERASED OFFSETS FROM ZOOKEEPER > > After we fixed erasing offsets, we start from Some Offsets in the past. > And during batch duration we can't limit how many messages we get from > Kafka. > AND HERE WE OOM > > And it's just a pain. Complete pain. > > And you remember, only some machines consumes. Usually two or three. > Because of broken high-level consumer in kafka. >