Great to hear Dyana. Thanks for the update.

Cheers,
Till

On Fri, Jun 7, 2019 at 2:48 PM dyana.rose <dyana.r...@salecycle.com> wrote:

> Just wanted to give an update on this.
>
> Our ops team and myself independently came to the same conclusion that our
> ZooKeeper quorum was having syncing issues.
>
> After a bit more research, they have updated the initLimit and syncLimit
> in the quorum configs to:
> initLimit=10
> syncLimit=5
>
> After this change we no longer saw any of the issues we were having.
>
> Thanks,
> Dyana
>
> On 2019/05/02 08:43:14, Till Rohrmann <trohrm...@apache.org> wrote:
> > Thanks for the update Dyana. I'm also not an expert in running one's own
> > ZooKeeper cluster. It might be related to setting the ZooKeeper cluster
> > properly up. Maybe someone else from the community has experience with
> > this. Therefore, I'm cross posting this thread to the user ML again to
> have
> > a wider reach.
> >
> > Cheers,
> > Till
> >
> > On Wed, May 1, 2019 at 10:17 AM dyana.rose <dyana.r...@salecycle.com>
> wrote:
> >
> > > Like all the best problems, I can't get this to reproduce locally.
> > >
> > > Everything has worked as expected. I started up a test job with 5
> retained
> > > checkpoints, let it run and watched the nodes in zookeeper.
> > >
> > > Then shut down and restarted the Flink cluster.
> > >
> > > The ephemeral lock nodes in the retained checkpoints transitioned from
> one
> > > lock id to another without a hitch.
> > >
> > > So that's good.
> > >
> > > As I understand it, if the Zookeeper cluster is having a sync issue,
> > > ephemeral nodes may not get deleted when the session becomes inactive.
> > > We're new to running our own zookeeper so it may be down to that.
> > >
> >
>

Reply via email to