Thank you Edward and Christophe!

2018-03-29 17:55 GMT+02:00 Edward Alexander Rojas Clavijo <
edward.roja...@gmail.com>:

> Hi all,
>
> I did some tests based on the PR Christophe mentioned above and by making
> a change on the NettyClient to use CanonicalHostName instead of
> HostNameAddress to identify the server, the SSL validation works!!
>
> I created a PR with this change: https://github.com/apache/flink/pull/5789
>
> Regards,
> Edward
>
> 2018-03-28 17:22 GMT+02:00 Edward Alexander Rojas Clavijo <
> edward.roja...@gmail.com>:
>
>> Hi Till,
>>
>> I just created the JIRA ticket: https://issues.apache.org/jira
>> /browse/FLINK-9103
>>
>> I added the JobManager and TaskManager logs, Hope this helps to resolve
>> the issue.
>>
>> Regards,
>> Edward
>>
>> 2018-03-27 17:48 GMT+02:00 Till Rohrmann <trohrm...@apache.org>:
>>
>>> Hi Edward,
>>>
>>> could you please file a JIRA issue for this problem. It might be as
>>> simple as that the TaskManager's network stack uses the IP instead of the
>>> hostname as you suggested. But we have to look into this to be sure. Also
>>> the logs of the JobManager as well as the TaskManagers could be helpful.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Mar 27, 2018 at 5:17 PM, Christophe Jolif <cjo...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> I suspect this relates to: https://issues.apache.org/
>>>> jira/browse/FLINK-5030
>>>>
>>>> For which there was a PR at some point but nothing has been done so
>>>> far. It seems the current code explicitly uses the IP vs Hostname for Netty
>>>> SSL configuration.
>>>>
>>>> Without that I'm really wondering how people are reasonably using SSL
>>>> on a Kubernetes Flink-based cluster as every time a pod is (re-started) it
>>>> can theoretically take a different IP? Or do I miss something?
>>>>
>>>> --
>>>> Christophe
>>>>
>>>> On Tue, Mar 27, 2018 at 3:24 PM, Edward Alexander Rojas Clavijo <
>>>> edward.roja...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Currently I have a Flink 1.4 cluster running on kubernetes and with
>>>>> SSL configuration based on https://ci.apache.org/proje
>>>>> cts/flink/flink-docs-master/ops/security-ssl.html.
>>>>>
>>>>> However, as the IP of the nodes are dynamic (from the nature of
>>>>> kubernetes), we are using only the DNS which we can control using
>>>>> kubernetes services. So we add to the Subject Alternative Name(SAN) the
>>>>> flink-jobmanager DNS and also the DNS for the task managers
>>>>> *.flink-taskmanager-svc (each task manager has a DNS in the form
>>>>> flink-taskmanager-0.flink-taskmanager-svc).
>>>>>
>>>>> Additionally we set the jobmanager.rpc.address property on all the
>>>>> nodes and each task manager sets the taskmanager.host property, all
>>>>> matching the ones on the certificate.
>>>>>
>>>>> This is working well when using Job with Parallelism set to 1. The SSL
>>>>> validations are good and the Jobmanager can communicate with Task manager
>>>>> and vice versa.
>>>>>
>>>>> But when we set the parallelism to more than 1 we have exceptions on
>>>>> the SSL validation like this:
>>>>>
>>>>> Caused by: java.security.cert.CertificateException: No subject
>>>>> alternative names matching IP address 172.30.247.163 found
>>>>> at sun.security.util.HostnameChecker.matchIP(HostnameChecker.ja
>>>>> va:168)
>>>>> at sun.security.util.HostnameChecker.match(HostnameChecker.java:94)
>>>>> at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509Trus
>>>>> tManagerImpl.java:455)
>>>>> at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509Trus
>>>>> tManagerImpl.java:436)
>>>>> at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509Trust
>>>>> ManagerImpl.java:252)
>>>>> at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X50
>>>>> 9TrustManagerImpl.java:136)
>>>>> at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHa
>>>>> ndshaker.java:1601)
>>>>> ... 21 more
>>>>>
>>>>>
>>>>> From the logs I see the Jobmanager is correctly registering the
>>>>> taskmanagers:
>>>>>
>>>>> org.apache.flink.runtime.instance.InstanceManager   - Registered
>>>>> TaskManager at flink-taskmanager-1 (akka.ssl.tcp://flink@taiga-fl
>>>>> ink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local:6122/user/taskmanager)
>>>>> as 1a3f59693cec8b3929ed8898edcc2700. Current number of registered
>>>>> hosts is 3. Current number of alive task slots is 6.
>>>>>
>>>>> And also each taskmanager is correctly registered to use the hostname
>>>>> for communication:
>>>>>
>>>>> org.apache.flink.runtime.taskmanager.TaskManager   - TaskManager will
>>>>> use hostname/address 'flink-taskmanager-1.flink-tas
>>>>> kmanager-svc.default.svc.cluster.local' (172.30.247.163) for
>>>>> communication.
>>>>> ...
>>>>> akka.remote.Remoting   - Remoting started; listening on addresses
>>>>> :[akka.ssl.tcp://flink@flink-taskmanager-1.flink-taskmanager
>>>>> -svc.default.svc.cluster.local:6122]
>>>>> ...
>>>>> org.apache.flink.runtime.io.network.netty.NettyConfig   - NettyConfig
>>>>> [server address: flink-taskmanager-1.flink-task
>>>>> manager-svc.default.svc.cluster.local/172.30.247.163, server port:
>>>>> 6121, ssl enabled: true, memory segment size (bytes): 32768, transport
>>>>> type: NIO, number of server threads: 2 (manual), number of client threads:
>>>>> 2 (manual), server connect backlog: 0 (use Netty's default), client 
>>>>> connect
>>>>> timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's
>>>>> default)]
>>>>> ...
>>>>> org.apache.flink.runtime.taskmanager.TaskManager   - TaskManager data
>>>>> connection information: bf4a9b50e57c99c17049adb66d65f685 @
>>>>> flink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local
>>>>> (dataPort=6121)
>>>>>
>>>>>
>>>>>
>>>>> But even with that, it seems like the taskmanagers are using the IP
>>>>> communicate between them and the SSL validation fails.
>>>>>
>>>>> Do you know if it's possible to make the taskmanagers to use the
>>>>> hostname to communicate instead of the IP ?
>>>>> or
>>>>> Do you have any advice to get the SSL configuration to work on this
>>>>> environment ?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Regards,
>>>>> Edward
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Christophe
>>>>
>>>
>>>
>>
>>
>> --
>> *Edward Alexander Rojas Clavijo*
>>
>>
>>
>> *Software EngineerHybrid CloudIBM France*
>>
>
>

Reply via email to