part of the checkpointed metadata in the spark
context.
-adrian
From: Cody Koeninger
Date: Tuesday, September 29, 2015 at 12:49 AM
To: Sourabh Chandak
Cc: Augustus Hong, "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Re: Adding / Removing worker nodes for Spark
If a node fails, the partition / offset range that it was working on will
be scheduled to run on another node. This is generally true of spark,
regardless of checkpointing.
The offset ranges for a given batch are stored in the checkpoint for that
batch. That's relevant if your entire job fails (
I also have the same use case as Augustus, and have some basic questions
about recovery from checkpoint. I have a 10 node Kafka cluster and a 30
node Spark cluster running streaming job, how is the (topic, partition)
data handled in checkpointing. The scenario I want to understand is, in
case of no
Got it, thank you!
On Mon, Sep 28, 2015 at 11:37 AM, Cody Koeninger wrote:
> Losing worker nodes without stopping is definitely possible. I haven't
> had much success adding workers to a running job, but I also haven't spent
> much time on it.
>
> If you're restarting with the same jar, you sh
Losing worker nodes without stopping is definitely possible. I haven't had
much success adding workers to a running job, but I also haven't spent much
time on it.
If you're restarting with the same jar, you should be able to recover from
checkpoint without losing data (usual caveats apply, e.g. y