Thanks Nathan,
I changed the protocol.port to 10002 on both servers.
On server 1, I now just see endless copies of the second error from my original
message (“KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss”) – I don’t know if that’s normal when there’s only a single
member of a cluster alive and running? Seems like the logs will fill up very
quickly if it is!
On server 2, I get a bind exception on the Zookeeper client port. It doesn’t
matter what I set it to (In this example, I changed it to 10500) I always get
the same result. If I run netstat when nifi isn’t running, there’s nothing
listening on the port. It’s like NiFi is starting two Zookeeper instances?!
There’s no repeat of this in the start up sequence though. Both servers are
running completely vanilla 1.6.0 – I don’t even have any flow defined yet, this
is purely for teaching myself clustering config – so I don’t know why one is
behaving differently to the other.
2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500]
o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000
minSessionTimeout 4000 maxSessionTimeout 40000 datadir
./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500]
o.apache.zookeeper.server.quorum.Leader Couldn't bind to
nifi2.domain/192.168.10.102:10500
java.net.BindException: Address already in use (Bind failed)
at java.net.PlainSocketImpl.socketBind(Native Method)
at
java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.ServerSocket.bind(ServerSocket.java:375)
at java.net.ServerSocket.bind(ServerSocket.java:329)
at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
at
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
From: Nathan Gough
Sent: Tuesday, 2 October 2018 2:22 AM
To: [email protected]
Subject: Re: Zookeeper - help!
Hi Phil,
One thing I notice with your config is that the cluster.node.protol.port and
the zookeeper ports are the same - these should not be the same.
Node.protocol.port is used by NiFi cluster to communicate between nodes, the
zookeeper.connect.string port should be the port that zookeeper service is
listening on. The zookeeper port is configured by the clientPort property in
the zookeeper.properties file. This would make your connect string:
'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180
is whatever clientPort is configured.
You can read more about how NiFi uses Zookeeper and how to configure it here:
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.
Let us know what happens once these properties are configured correctly.
Nathan
On 9/30/18, 11:07 PM, "Phil H" <[email protected]> wrote:
Hi guys,
Pulling my hair out trying to solve my Zookeeper problems. I have two
1.6.0 servers that I am trying to cluster.
Here is the except from the properties files – all other properties are
default so omitted for clarity. The servers are set up to run HTTPS, and the
interface works via the browser, so I believe the certificates are correctly
installed.
Server nifi1.domain:
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi1.domain
nifi.cluster.node.protocol.port=10000
nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
nifi.zookeeper.root.node=/nifi
Server nifi2.domain:
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi2.domain
nifi.cluster.node.protocol.port=10000
nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
nifi.zookeeper.root.node=/nifi
I am getting these errors (this is from server 2, but seeing the same on
server 1 apart from a different address, of course):
2018-10-01 20:54:16,332 INFO [main]
org.apache.nifi.io.socket.SocketListener Now listening for connections from
nodes on port 10000
2018-10-01 20:54:16,381 INFO [main] o.apache.nifi.controller.FlowController
Successfully synchronized controller with proposed flow
2018-10-01 20:54:16,435 INFO [main] o.a.nifi.controller.StandardFlowService
Connecting Node: nifi2.domain:443
2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1]
o.a.nifi.security.util.CertificateUtils The incoming request did not contain
client certificates and thus the DN cannot be extracted. Check that the other
endpoint is providing a complete client certificate chain
2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1]
o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from
nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException:
java.security.cert.CertificateException:
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
org.apache.nifi.cluster.protocol.ProtocolException:
java.security.cert.CertificateException:
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
at
org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
at
org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
at
org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.security.cert.CertificateException:
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
at
org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
at
org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
at
org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
... 5 common frames omitted
Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
at
sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
at
org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
... 7 common frames omitted
2018-10-01 20:54:32,249 INFO [Curator-Framework-0]
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2018-10-01 20:54:32,250 ERROR [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)