Re: KafkaInputDStream mapping of partitions to tasks

Evgeny Shishkin Thu, 27 Mar 2014 15:24:25 -0700

On 28 Mar 2014, at 01:11, Scott Clasen <scott.cla...@gmail.com> wrote:


> Evgeniy Shishkin wrote
>> So, at the bottom — kafka input stream just does not work.
> 
> 
> That was the conclusion I was coming to as well.  Are there open tickets
> around fixing this up?
> 

I am not aware of such. Actually nobody complained on spark+kafka before.
So i thought it just works, and then we tried to build something on it and 
almost failed.

I think that it is possible to steal/replicate how twitter storm works with 
kafka.
They do manual partition assignment, at least this would help to balance load.

There is another issue.
ssc batch creates new rdds every batch duration, always, even it previous 
computation did not finish.

But with kafka, we can consume more rdds later, after we finish previous rdds.
That way it would be much much simpler to not get OOM’ed when starting from 
beginning,
because we can consume many data from kafka during batch duration and then get 
oom.

But we just can not start slow, can not limit how many to consume during batch. 
 


> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3379.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: KafkaInputDStream mapping of partitions to tasks

Reply via email to