By the way Fabian, any chance this issue is looked into / the PR considered
for 1.5?


On Wed, Apr 4, 2018 at 2:41 PM, Fabian Hueske <> wrote:

> Thank you Edward and Christophe!
> 2018-03-29 17:55 GMT+02:00 Edward Alexander Rojas Clavijo <
>> Hi all,
>> I did some tests based on the PR Christophe mentioned above and by making
>> a change on the NettyClient to use CanonicalHostName instead of
>> HostNameAddress to identify the server, the SSL validation works!!
>> I created a PR with this change:
>> che/flink/pull/5789
>> Regards,
>> Edward
>> 2018-03-28 17:22 GMT+02:00 Edward Alexander Rojas Clavijo <
>>> Hi Till,
>>> I just created the JIRA ticket:
>>> /browse/FLINK-9103
>>> I added the JobManager and TaskManager logs, Hope this helps to resolve
>>> the issue.
>>> Regards,
>>> Edward
>>> 2018-03-27 17:48 GMT+02:00 Till Rohrmann <>:
>>>> Hi Edward,
>>>> could you please file a JIRA issue for this problem. It might be as
>>>> simple as that the TaskManager's network stack uses the IP instead of the
>>>> hostname as you suggested. But we have to look into this to be sure. Also
>>>> the logs of the JobManager as well as the TaskManagers could be helpful.
>>>> Cheers,
>>>> Till
>>>> On Tue, Mar 27, 2018 at 5:17 PM, Christophe Jolif <>
>>>> wrote:
>>>>> I suspect this relates to:
>>>>> jira/browse/FLINK-5030
>>>>> For which there was a PR at some point but nothing has been done so
>>>>> far. It seems the current code explicitly uses the IP vs Hostname for 
>>>>> Netty
>>>>> SSL configuration.
>>>>> Without that I'm really wondering how people are reasonably using SSL
>>>>> on a Kubernetes Flink-based cluster as every time a pod is (re-started) it
>>>>> can theoretically take a different IP? Or do I miss something?
>>>>> --
>>>>> Christophe
>>>>> On Tue, Mar 27, 2018 at 3:24 PM, Edward Alexander Rojas Clavijo <
>>>>>> wrote:
>>>>>> Hi all,
>>>>>> Currently I have a Flink 1.4 cluster running on kubernetes and with
>>>>>> SSL configuration based on
>>>>>> cts/flink/flink-docs-master/ops/security-ssl.html.
>>>>>> However, as the IP of the nodes are dynamic (from the nature of
>>>>>> kubernetes), we are using only the DNS which we can control using
>>>>>> kubernetes services. So we add to the Subject Alternative Name(SAN) the
>>>>>> flink-jobmanager DNS and also the DNS for the task managers
>>>>>> *.flink-taskmanager-svc (each task manager has a DNS in the form
>>>>>> flink-taskmanager-0.flink-taskmanager-svc).
>>>>>> Additionally we set the jobmanager.rpc.address property on all the
>>>>>> nodes and each task manager sets the property, all
>>>>>> matching the ones on the certificate.
>>>>>> This is working well when using Job with Parallelism set to 1. The
>>>>>> SSL validations are good and the Jobmanager can communicate with Task
>>>>>> manager and vice versa.
>>>>>> But when we set the parallelism to more than 1 we have exceptions on
>>>>>> the SSL validation like this:
>>>>>> Caused by: No subject
>>>>>> alternative names matching IP address found
>>>>>> at
>>>>>> va:168)
>>>>>> at
>>>>>> at
>>>>>> at
>>>>>> at
>>>>>> at
>>>>>> at
>>>>>> ... 21 more
>>>>>> From the logs I see the Jobmanager is correctly registering the
>>>>>> taskmanagers:
>>>>>> org.apache.flink.runtime.instance.InstanceManager   - Registered
>>>>>> TaskManager at flink-taskmanager-1 (akka.ssl.tcp://flink@taiga-fl
>>>>>> ink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local:6122/user/taskmanager)
>>>>>> as 1a3f59693cec8b3929ed8898edcc2700. Current number of registered
>>>>>> hosts is 3. Current number of alive task slots is 6.
>>>>>> And also each taskmanager is correctly registered to use the hostname
>>>>>> for communication:
>>>>>> org.apache.flink.runtime.taskmanager.TaskManager   - TaskManager
>>>>>> will use hostname/address 'flink-taskmanager-1.flink-tas
>>>>>> kmanager-svc.default.svc.cluster.local' ( for
>>>>>> communication.
>>>>>> ...
>>>>>> akka.remote.Remoting   - Remoting started; listening on addresses
>>>>>> :[akka.ssl.tcp://flink@flink-taskmanager-1.flink-taskmanager
>>>>>> -svc.default.svc.cluster.local:6122]
>>>>>> ...
>>>>>>   -
>>>>>> NettyConfig [server address: flink-taskmanager-1.flink-task
>>>>>> manager-svc.default.svc.cluster.local/, server port:
>>>>>> 6121, ssl enabled: true, memory segment size (bytes): 32768, transport
>>>>>> type: NIO, number of server threads: 2 (manual), number of client 
>>>>>> threads:
>>>>>> 2 (manual), server connect backlog: 0 (use Netty's default), client 
>>>>>> connect
>>>>>> timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's
>>>>>> default)]
>>>>>> ...
>>>>>> org.apache.flink.runtime.taskmanager.TaskManager   - TaskManager
>>>>>> data connection information: bf4a9b50e57c99c17049adb66d65f685 @
>>>>>> flink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local
>>>>>> (dataPort=6121)
>>>>>> But even with that, it seems like the taskmanagers are using the IP
>>>>>> communicate between them and the SSL validation fails.
>>>>>> Do you know if it's possible to make the taskmanagers to use the
>>>>>> hostname to communicate instead of the IP ?
>>>>>> or
>>>>>> Do you have any advice to get the SSL configuration to work on this
>>>>>> environment ?
>>>>>> Thanks in advance.
>>>>>> Regards,
>>>>>> Edward

Reply via email to