The current direct stream only handles exactly the partitions
specified at startup.  You'd have to restart the job if you changed
partitions.

https://issues.apache.org/jira/browse/SPARK-12177 has the ongoing work
towards using the kafka 0.10 consumer, which would allow for dynamic
topicparittions

Regarding your multi-DC questions, I'm not really clear on what you're saying.

On Mon, Apr 18, 2016 at 11:18 AM, Erwan ALLAIN <eallain.po...@gmail.com> wrote:
> Hello,
>
> I'm currently designing a solution where 2 distinct clusters Spark (2
> datacenters) share the same Kafka (Kafka rack aware or manual broker
> repartition).
> The aims are
> - preventing DC crash: using kafka resiliency and consumer group mechanism
> (or else ?)
> - keeping consistent offset among replica (vs mirror maker,which does not
> keep offset)
>
> I have several questions
>
> 1) Dynamic repartition (one or 2 DC)
>
> I'm using KafkaDirectStream which map one partition kafka with one spark. Is
> it possible to handle new or removed partition ?
> In the compute method, it looks like we are always using the currentOffset
> map to query the next batch and therefore it's always the same number of
> partition ? Can we request metadata at each batch ?
>
> 2) Multi DC Spark
>
> Using Direct approach, a way to achieve this would be
> - to "assign" (kafka 0.9 term) all topics to the 2 sparks
> - only one is reading the partition (Check every x interval, "lock" stored
> in cassandra for instance)
>
> => not sure if it works just an idea
>
> Using Consumer Group
> - CommitOffset manually at the end of the batch
>
> => Does spark handle partition rebalancing ?
>
> I'd appreciate any ideas ! Let me know if it's not clear.
>
> Erwan
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to