FYI: As part of an Uber internship project, we were working on exactly this
problem. Our approach was to do a rolling restart of all the containers
wherein we start a "replica" container for each primary container and let
it "catch up" before we do the switch. Of course this doesn't guarantee
zero downtime, but it does guarantee minimum time to upgrade each such
container.

The code is still in POC, but we do plan to finish this and make this
available. Let me know if you're interested in trying it out.

FYI: the sticky container deployment will also minimize the time to upgrade
/ deploy since majority of the upgrade time is taken up by the container in
reading all the changelog (if any). Upgrade / re-deploy will also take a
long time if the checkpoint topic is not log compacted (which is true in
our environment).

Thanks,
C

On Wed, Jan 6, 2016 at 9:56 AM, Bae, Jae Hyeon <metac...@gmail.com> wrote:

> Hi Samza devs and users
>
> I know this will be tricky in Samza because Samza Kafka consumer is not
> coordinated externally, but do you have any idea how to deploy samza jobs
> with zero downtime?
>
> Thank you
> Best, Jae
>



-- 
Thanks and regards

Chinmay Soman

Reply via email to