I think the root cause may have been that my flow.xml.gz files were out of
sync: I had scp'ed the same file to all servers, but apparently something
with the controller service IDs caused them to get out of sync as soon as
they started up.  I'm not sure why that affected zookeeper, but as soon as
I deleted all but one of the flow files, the problem resolved.

On Fri, Nov 18, 2016 at 2:35 PM, Mark Payne <[email protected]> wrote:

> Joe,
>
> Assuming that you're using an embedded ZooKeeper server, it is not
> surprising that you saw a lot of
> ERROR-level messages about dropped ZK connections. Since you have only 1
> of 3 NiFi nodes up,
> you had only 1 of 3 ZK servers, so there was no quorum and you were
> continually trying to connect
> to nodes that were not available. Once the other nodes were started, you
> should be okay.
>
> The log messages that you are seeing there indicating weird ports I
> believe are the ephemeral ports
> that the client is using for the outgoing connection. These should not
> need to be opened up in your VM
> (assuming that you're not blocking outbound ports). The last message there
> indicates that a session was
> established with a timeout of 4000 milliseconds, so I don't believe
> there's any problem with ports being
> blocked.
>
> However, once the nodes have all started up, they shouldn't have problems
> connecting to each other. Can
> you grep your logs for "changed from"? NiFi logs at an INFO level every
> time the connection
> status of a node in the cluster changes. This may shed some light as to
> why the nodes
> were not connecting to the cluster.
>
> Thanks
> -Mark
>
>
> On Nov 18, 2016, at 12:30 PM, Jeff <[email protected]<mailto:jtsw
> [email protected]>> wrote:
>
> Joe,
>
> I'm glad you were able to get the nodes to reconnect, but I'm interested to
> know how it got into a state where it couldn't start up previously.  If you
> can reproduce the scenario, and provide the full logs and your NiFi
> configuration, we can investigate what caused it to get into that state.
>
> On Fri, Nov 18, 2016 at 12:17 PM Joe Gresock <[email protected]<mailto:
> [email protected]>> wrote:
>
> I waited the 5 minutes of the election process, and then several minutes
> beyond that.
>
> Incidentally, when I cleared the state (except zookeeper/my_id) from all
> the nodes, and deleted the flow.xml.gz from all but one of the nodes, and
> then restarted hte whole cluster, it came back.
>
> On Fri, Nov 18, 2016 at 5:11 PM, Jeff <[email protected]<mailto:jtsw
> [email protected]>> wrote:
>
> Hello Joe,
>
> Just out of curiosity, how long did you let NiFi run while waiting for
> the
> nodes to connect?
>
> On Fri, Nov 18, 2016 at 10:53 AM Joe Gresock <[email protected]<mailto:
> [email protected]>> wrote:
>
> Despite starting up, the nodes now cannot connect to each other, so
> they're
> all listed as Disconnected in the UI.  I see this in the logs:
>
> 2016-11-18 15:50:19,080 INFO [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181]
> o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new
> session at /172.31.33.34:47224
> 2016-11-18 15:50:19,081 INFO [CommitProcessor:2]
> o.a.zookeeper.server.ZooKeeperServer Established session
> 0x258781845940bf9
> with negotiated timeout 4000 for client /172.31.33.34:47224
> 2016-11-18 15:50:19,185 INFO [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181]
> o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new
> session at /172.31.33.34:47228
> 2016-11-18 15:50:19,186 INFO [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181]
> o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new
> session at /172.31.33.34:47230
> 2016-11-18 15:50:19,187 INFO [CommitProcessor:2]
> o.a.zookeeper.server.ZooKeeperServer Established session
> 0x258781845940bfa
> with negotiated timeout 4000 for client /172.31.33.34:47228
> 2016-11-18 15:50:19,187 INFO [CommitProcessor:2]
> o.a.zookeeper.server.ZooKeeperServer Established session
> 0x258781845940bfb
> with negotiated timeout 4000 for client /172.31.33.34:47230
> 2016-11-18 15:50:19,292 INFO [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181]
> o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new
> session at /172.31.33.34:47234
> 2016-11-18 15:50:19,293 INFO [CommitProcessor:2]
> o.a.zookeeper.server.ZooKeeperServer Established session
> 0x258781845940bfc
> with negotiated timeout 4000 for client /172.31.33.34:47234
>
>
> However, I definitely did not open any ports similar to 47234 on my
> nifi
> VMs.  Is there a certain set of ports that need to be open between the
> servers?  My understanding was that only 2888, 3888, and 2121 were
> necessary for zookeeper.
>
> On Fri, Nov 18, 2016 at 3:41 PM, Joe Gresock <[email protected]<mailto:
> [email protected]>>
> wrote:
>
> It appears that if you try to start up just one node in a cluster
> with
> multiple zk hosts specified in zookeeper.properties, you get this
> error
> spammed at an incredible rate in your logs.  When I started up all 3
> nodes
> at once, they didn't receive the error.
>
> On Fri, Nov 18, 2016 at 3:18 PM, Joe Gresock <[email protected]<mailto:
> [email protected]>>
> wrote:
>
> I'm upgrading a test 0.x nifi cluster to 1.x using the latest in
> master
> as of today.
>
> I was able to successfully start the 3-node cluster once, but then I
> restarted it and get the following error spammed in the
> nifi-app.log.
>
> I'm not sure where to start debugging this, and I'm puzzled why it
> would
> work once and then start giving me errors on the second restart.
> Has
> anyone run into this error?
>
> 2016-11-18 15:07:18,178 INFO [main] org.eclipse.jetty.server.Server
> Started @83426ms
> 2016-11-18 15:07:18,883 INFO [main]
> org.apache.nifi.web.server.JettyServer
> Loading Flow...
> 2016-11-18 15:07:18,889 INFO [main]
> org.apache.nifi.io.socket.SocketListener
> Now listening for connections from nodes on port 9001
> 2016-11-18 15:07:19,117 INFO [main]
> o.a.nifi.controller.StandardFlowService
> Connecting Node: ip-172-31-33-34.ec2.internal:8443
> 2016-11-18 15:07:25,781 WARN [main]
> o.a.nifi.controller.StandardFlowService
> There is currently no Cluster Coordinator. This often happens upon
> restart
> of NiFi when running an embedded ZooKeeper. Will register this node
> to
> become the active Cluster Coordinator and will attempt to connect to
> cluster again
> 2016-11-18 15:07:25,782 INFO [main]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> CuratorLeaderElectionManager[stopped=false] Attempted to register
> Leader
> Election for role 'Cluster Coordinator' but this role is already
> registered
> 2016-11-18 15:07:34,685 WARN [main]
> o.a.nifi.controller.StandardFlowService
> There is currently no Cluster Coordinator. This often happens upon
> restart
> of NiFi when running an embedded ZooKeeper. Will register this node
> to
> become the active Cluster Coordinator and will attempt to connect to
> cluster again
> 2016-11-18 15:07:34,685 INFO [main]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> CuratorLeaderElectionManager[stopped=false] Attempted to register
> Leader
> Election for role 'Cluster Coordinator' but this role is already
> registered
> 2016-11-18 15:07:34,696 INFO [Curator-Framework-0]
> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
> 2016-11-18 15:07:34,698 INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager
> org.apache.nifi.controller.lea
> der.election.CuratorLeaderElectionManager$ElectionListener@671a652a
> Connection State changed to SUSPENDED
>
> *2016-11-18 15:07:34,699 ERROR [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave
> uporg.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss*
>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
>        at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> check
> BackgroundRetry(CuratorFrameworkImpl.java:728)
> [curator-framework-2.11.0.jar:na]
>        at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> perfo
> rmBackgroundOperation(CuratorFrameworkImpl.java:857)
> [curator-framework-2.11.0.jar:na]
>        at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> backg
> roundOperationsLoop(CuratorFrameworkImpl.java:809)
> [curator-framework-2.11.0.jar:na]
>        at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> acces
> s$300(CuratorFrameworkImpl.java:64)
> [curator-framework-2.11.0.jar:na]
>        at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.
> cal
> l(CuratorFrameworkImpl.java:267) [curator-framework-2.11.0.jar:na]
>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_111]
>        at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFu
> tureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_111]
>        at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFu
> tureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_111]
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> [na:1.8.0_111]
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> [na:1.8.0_111]
>        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
>
>
> --
> I know what it is to be in need, and I know what it is to have
> plenty.
> I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I
> can
> do all this through him who gives me strength.    *-Philippians
> 4:12-13*
>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have
> plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I
> can
> do all this through him who gives me strength.    *-Philippians
> 4:12-13*
>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have
> plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>
>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>
>
>


-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Reply via email to