Hi Ufuk, thanks for replying. Regarding the masters file: yes, I've specified all the masters and checked out that they were actually running after the start-cluster.sh. I'll gladly share the logs as soon as I get to see them.
Regarding the state backend: how does having a non-distributed storage as the state backend influence the HA features? I thought it would have meant that the job state couldn't be restored but the job itself could've been started after the backup job manager started. Does not having a reliable distributed storage service as the state backend mean that the HA features don't work? Again, thank you very much. On Mon, Feb 15, 2016 at 12:48 PM, Ufuk Celebi <u...@apache.org> wrote: > Using the local file system as state backend only works if all job > managers run on the same machine. Is that the case? > > Have you specified all job managers in the masters file? With the > local file system state backend only something like > > host-X > host-X > host-X > > will be a valid masters configuration. > > Can you please share the job manager logs of all started job managers? > > – Ufuk > > > On Mon, Feb 15, 2016 at 12:35 PM, Stefano Baghino > <stefano.bagh...@radicalbit.io> wrote: > > Hello everyone, > > > > last week I've ran some tests with Apache ZooKeeper to get a grip on > Flink > > HA features. My tests went bad so far and I can't sort out the reason. > > > > My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3 > > masters and 4 slaves. The 3 masters are also the ZooKeeper (3.4.6) > ensemble. > > I've started ZooKeeper on each machine, tested it's availability and then > > started the Flink cluster. Since there's no reliable distributed > filesystem > > on the cluster, I had to use the local file system as the state backend. > > > > I then submitted a very simple streaming job that writes the timestamp > on a > > text file on the local file system each second and then went on to kill > the > > process running the job manager to verify that another job manager takes > > over. However, the job just stopped. I still have to perform some checks > on > > the handover to the new job manager, but before digging deeper I wanted > to > > ask if my expectation of having the job going despite the job manager > > failure is unreasonable. > > > > Thanks in advance. > > > > -- > > BR, > > Stefano Baghino > > > > Software Engineer @ Radicalbit > -- BR, Stefano Baghino Software Engineer @ Radicalbit