Hi community, We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos integration will require each container of a job to log in Kerberos via keytab every say, 24 hours, and does not use any Hadoop delegation token mechanism except when localizing the container. As I fixed the current Flink-Yarn-Kerberos (FLINK-8275) and tried to add more features(FLINK-7860), I have some concern regarding the current implementation. It can pose a scalability issue to the KDC, e.g., if YARN cluster is restarted and all 10s of thousands of containers suddenly DDOS KDC.
I would like to propose to improve the current Flink-YARN-Kerberos integration by doing something like the following: 1) AppMaster (JobManager) periodically authenticate the KDC, get all required DTs for the job. 2) all other TM or TE containers periodically retrieve new DTs from the AppMaster (either through a secure HDFS folder, or a secure Akka channel) Also, we want to extend Flink to support pluggable AuthN mechanism, because we have our own internal AuthN mechanism. We would like add support in Flink to authenticate periodically to our internal AuthN service as well through, e.g., dynamic class loading, and use similar mechanism to distribute the credential from the appMaster to containers. I would like to get comments and feedbacks. I can also write a design doc or create a Flip if needed. Thanks a lot. Shuyi -- "So you have to trust that the dots will somehow connect in your future."