Okay,

I have got this working now, albeit with only a single ZK instance (at this 
stage).

The missing piece of the puzzle that wasn’t in the guides from Pierre was that 
cluster servers’ certificates need to be installed in each server’s keystore, 
and all the cluster server DNs need to be added as Initial User Identities in 
authorizers.xml.

Thanks again for all the assistance.

Sent from Mail for Windows 10

From: Nathan Gough
Sent: Wednesday, 3 October 2018 7:27 AM
To: [email protected]
Subject: Re: Zookeeper - help!

I think you are correct on that, I assumed it was a range of some kind but it 
looks like it's not: 
http://zookeeper.apache.org/doc/r3.4.3/zookeeperStarted.html#sc_RunningReplicatedZooKeeper


On 10/2/18, 5:17 PM, "Phil H" <[email protected]> wrote:

    The second port in the zookeeper server config has been a mystery to me.  I 
thought it was a second port used for elections, not the upper bound in a 
range.  Why is the range so large?
    
    Sent from Mail for Windows 10
    
    From: Nathan Gough
    Sent: Wednesday, 3 October 2018 1:26 AM
    To: [email protected]
    Subject: Re: Zookeeper - help!
    
    Check your configs on nifi2. I don't believe that NiFi is starting two 
instances of Zookeeper but the ports configured are unintentionally configured 
to overlap ie. Ports used twice in different configs where they should be 
different.
    
    It may be that your zookeeper.properties has:
    
    clientPort=2180
    ...
    server.1=nifi1.com:2180:3888
    server.2=nifi2.com:2180:3888
    
    where it should be:
    
    clientPort=2180
    ...
    server.1=nifi1.com:2888:3888
    server.2=nifi2.com:2888:3888
    
    noticing that the server.1 and server.2 ranges don't overlap with the 
client port.
    
    
    Not sure if this helps, but the following is the relevant config that I 
have for my NiFi cluster nodes that run on the SAME machine where nifi1.com and 
nifi2.com are configured in /etc/hosts:
    
    nifi1/conf
    zookeeper.properties
    - clientPort=2180
    - server.1=nifi1.com:2888:3888
    - server.2=nifi2.com:2888:3888
    
    nifi.properties
    - nifi.remote.input.host=nifi1.com
    - nifi.remote.input.socket.port=10440
    - nifi.web.http.host=nifi1.com
    - nifi.web.http.port=9550
    - nifi.cluster.node.address=nifi1.com
    - nifi.cluster.node.protocol.port=11440
    
    nifi1/state/zookeeper
    /myid (file contents = "1")
    /state-management.xml (no changes required)
    /version-2/
    
    
    nifi2/conf
    zookeeper.properties
    - clientPort=2181
    - server.1=nifi1.com:2888:3888
    - server.2=nifi2.com:2888:3888
    
    nifi.properties
    - nifi.remote.input.host=nifi2.com
    - nifi.remote.input.socket.port=10441
    - nifi.web.http.host=nifi2.com
    - nifi.web.http.port=9551
    - nifi.cluster.node.address=nifi2.com
    - nifi.cluster.node.protocol.port=11441
    
    nifi2/state/zookeeper
    /myid (file contents = "2")
    /state-management.xml (no changes required)
    /version-2/
    
    
    Nathan
    
    
    
    On 10/2/18, 2:07 AM, "Phil H" <[email protected]> wrote:
    
        Hi Andy,
        
        Thanks for the additional info.  I think I saw a link to that while 
searching but was wary since it was such an old version.
        
        I have two VMs (nifi1, and nifi2) both running NiFi with identical 
configs, and trying to use the inbuilt ZK to cluster them.
        
        If I only mention a single machine within the config (eg: if nifi1 
doesn’t refer to nifi2, or visa versa) I don’t get any start up errors.
        
        Phil
        
        From: Andy LoPresto
        Sent: Tuesday, 2 October 2018 1:00 PM
        To: [email protected]
        Subject: Re: Zookeeper - help!
        
        Hi Phil, 
        
        Nathan’s advice is correct but I think he was assuming all other 
configurations are correct as well. Are you trying to run both NiFi nodes and 
ZK instances on the same machine? In that case you will have to ensure that the 
ports in use are different for each service so they don’t conflict. Setting 
them all to the same value only works if each service is running on an 
independent physical machine, virtual machine, or container. 
        
        I find Pierre’s guide [1] to be a helpful step-by-step instruction list 
as well as a good explanation of how the clustering concepts work in practice. 
When you get that working, and you’re ready to set up a secure cluster, he has 
a follow-on guide for that as well [2]. Even as someone who has set up many 
clustered instances of NiFi, I use his guides regularly to ensure I haven’t 
forgotten a step. 
        
        They were originally written for versions 1.0.0 and 1.1.0, but the only 
thing that has changed is the authorizer configuration for the secure instances 
(you’ll need to put the Initial Admin Identity and Node Identities in two 
locations in the authorizers.xml file instead of just once). 
        
        Hopefully this helps you get a working cluster up and running so you 
can experiment. Good luck. 
        
        [1] 
https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
        [2] 
https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/
        
        
        Andy LoPresto
        [email protected]
        [email protected]
        PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
        
        On Oct 1, 2018, at 2:45 PM, Phil H <[email protected]> wrote:
        
        Thanks Nathan,
        
        I changed the protocol.port to 10002 on both servers.
        
        On server 1, I now just see endless copies of the second error from my 
original message (“KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss”) – I don’t know if that’s normal when there’s only a single 
member of a cluster alive and running?  Seems like the logs will fill up very 
quickly if it is!
        
        On server 2, I get a bind exception on the Zookeeper client port.  It 
doesn’t matter what I set it to (In this example, I changed it to 10500) I 
always get the same result.  If I run netstat when nifi isn’t running, there’s 
nothing listening on the port.  It’s like NiFi is starting two Zookeeper 
instances?!  There’s no repeat of this in the start up sequence though.  Both 
servers are running completely vanilla 1.6.0 – I don’t even have any flow 
defined yet, this is purely for teaching myself clustering config – so I don’t 
know why one is behaving differently to the other.
        
        2018-10-02 17:36:31,610 INFO [QuorumPeer[myid=2]/0.0.0.0:10500] 
o.a.zookeeper.server.ZooKeeperServer Created server with tickTime 2000 
minSessionTimeout 4000 maxSessionTimeout 40000 datadir 
./state/zookeeper/version-2 snapdir ./state/zookeeper/version-2
        2018-10-02 17:36:31,612 ERROR [QuorumPeer[myid=2]/0.0.0.0:10500] 
o.apache.zookeeper.server.quorum.Leader Couldn't bind to 
nifi2.domain/192.168.10.102:10500
        java.net.BindException: Address already in use (Bind failed)
                at java.net.PlainSocketImpl.socketBind(Native Method)
                at 
java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
                at java.net.ServerSocket.bind(ServerSocket.java:375)
                at java.net.ServerSocket.bind(ServerSocket.java:329)
                at 
org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:193)
                at 
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:605)
                at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:798)
        
        
        
        
        From: Nathan Gough
        Sent: Tuesday, 2 October 2018 2:22 AM
        To: [email protected]
        Subject: Re: Zookeeper - help!
        
        Hi Phil,
        
        One thing I notice with your config is that the 
cluster.node.protol.port and the zookeeper ports are the same - these should 
not be the same. Node.protocol.port is used by NiFi cluster to communicate 
between nodes, the zookeeper.connect.string port should be the port that 
zookeeper service is listening on. The zookeeper port is configured by the 
clientPort property in the zookeeper.properties file. This would make your 
connect string: 
'nifi.zookeeper.connect.string=nifi1.domain:2180,nifi2.domain:2180', where 2180 
is whatever clientPort is configured.
        
        You can read more about how NiFi uses Zookeeper and how to configure it 
here: 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management.
        
        Let us know what happens once these properties are configured correctly.
        
        Nathan
        
        
        On 9/30/18, 11:07 PM, "Phil H" <[email protected]> wrote:
        
           Hi guys,
        
           Pulling my hair out trying to solve my Zookeeper problems.  I have 
two 1.6.0 servers that I am trying to cluster.
        
           Here is the except from the properties files – all other properties 
are default so omitted for clarity.   The servers are set up to run HTTPS, and 
the interface works via the browser, so I believe the certificates are 
correctly installed.
        
           Server nifi1.domain:
           nifi.cluster.is.node=true
           nifi.cluster.node.address=nifi1.domain
           nifi.cluster.node.protocol.port=10000
        
           nifi.zookeeper.connect.string=nifi2.domain:10000,nifi1.domain:10000
           nifi.zookeeper.root.node=/nifi
        
           Server nifi2.domain:
           nifi.cluster.is.node=true
           nifi.cluster.node.address=nifi2.domain
           nifi.cluster.node.protocol.port=10000
        
           nifi.zookeeper.connect.string=nifi1.domain:10000,nifi2.domain:10000
           nifi.zookeeper.root.node=/nifi
        
           I am getting these errors (this is from server 2, but seeing the 
same on server 1 apart from a different address, of course):
        
           2018-10-01 20:54:16,332 INFO [main] 
org.apache.nifi.io.socket.SocketListener Now listening for connections from 
nodes on port 10000
           2018-10-01 20:54:16,381 INFO [main] 
o.apache.nifi.controller.FlowController Successfully synchronized controller 
with proposed flow
           2018-10-01 20:54:16,435 INFO [main] 
o.a.nifi.controller.StandardFlowService Connecting Node: nifi2.domain:443
           2018-10-01 20:54:16,769 ERROR [Process Cluster Protocol Request-1] 
o.a.nifi.security.util.CertificateUtils The incoming request did not contain 
client certificates and thus the DN cannot be extracted. Check that the other 
endpoint is providing a complete client certificate chain
           2018-10-01 20:54:16,771 WARN [Process Cluster Protocol Request-1] 
o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from 
nifi2 due to org.apache.nifi.cluster.protocol.ProtocolException: 
java.security.cert.CertificateException: 
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
           org.apache.nifi.cluster.protocol.ProtocolException: 
java.security.cert.CertificateException: 
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at 
org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:225)
                   at 
org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:131)
                   at 
org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:136)
                   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                   at java.lang.Thread.run(Thread.java:748)
           Caused by: java.security.cert.CertificateException: 
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
                   at 
org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:314)
                   at 
org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromSSLSocket(CertificateUtils.java:269)
                   at 
org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.getRequestorDN(SocketProtocolListener.java:223)
                   ... 5 common frames omitted
           Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not 
authenticated
                   at 
sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:440)
                   at 
org.apache.nifi.security.util.CertificateUtils.extractPeerDNFromClientSSLSocket(CertificateUtils.java:299)
                   ... 7 common frames omitted
        
        
        
           2018-10-01 20:54:32,249 INFO [Curator-Framework-0] 
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
           2018-10-01 20:54:32,250 ERROR [Curator-Framework-0] 
o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
           org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss
                   at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
                   at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
                   at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
                   at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
                   at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
                   at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
                   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
                   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                   at java.lang.Thread.run(Thread.java:748)
        
        
        
        
        
        
        
    
    
    
    



Reply via email to