Thanks for all the input!
On Sun, Nov 15, 2020 at 6:59 PM Xintong Song wrote:
> Is there a way to make the new yarn job only on the new hardware?
>
> I think you can simply decommission the nodes from Yarn, so that new
> containers will not be allocated from those nodes. You might also need a
>
>
> Is there a way to make the new yarn job only on the new hardware?
I think you can simply decommission the nodes from Yarn, so that new
containers will not be allocated from those nodes. You might also need a
large decommission timeout, upon which all the remaining running contains
on the decom
Hi,
it seems that YARN has a feature for targeting specific hardware:
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html
In any case, you'll need enough spare resources for some time to be able to
run your job twice for this kind of "zero downtime handover"
Awesome, thanks! Is there a way to make the new yarn job only on the new
hardware? Or would the two jobs have to run on intersecting hardware and
then would be switched on/off, which means we'll need a buffer of resources
for our orchestration?
Also, good point on recovery. I'll spend some time lo
Hey Rex,
the second approach (spinning up a standby job and then doing a handover)
sounds more promising to implement, without rewriting half of the Flink
codebase ;)
What you need is a tool that orchestrates creating a savepoint, starting a
second job from the savepoint and then communicating wit
Another thought, would it be possible to
* Spin up new core or task nodes.
* Run a new copy of the same job on these new nodes from a savepoint.
* Have the new job *not* write to the sink until the other job is torn down?
This would allow us to be eventually consistent and maintain writes going
th
Hello,
I'm trying to find a solution for auto scaling our Flink EMR cluster with 0
downtime using RocksDB as state storage and S3 backing store.
My current thoughts are like so:
* Scaling an Operator dynamically would require all keyed state to be
available to the set of subtasks for that operato