Ping, any comments? Thanks a lot. Shuyi
On Wed, Jan 3, 2018 at 3:43 PM, Shuyi Chen <suez1...@gmail.com> wrote: > Thanks a lot for the clarification, Eron. That's very helpful. Currently, > we are more concerned about 1) data access, but will get to 2) and 3) > eventually. > > I was thinking doing the following: > 1) extend the current HadoopModule to use and refresh DTs as suggested on YARN > Application Security docs. > 2) I found the current SecurityModule interface might be enough for > supporting other security mechanisms. However, the loading of security > modules are hard-coded, not configuration based. I think we can extend > SecurityUtils to load from configurations. So we can implement our own > security mechanism in our internal repo, and have flink jobs to load it at > runtime. > > Please let me know your comments. Thanks a lot. > > On Fri, Dec 22, 2017 at 3:05 PM, Eron Wright <eronwri...@gmail.com> wrote: > >> I agree that it is reasonable to use Hadoop DTs as you describe. That >> approach is even recommended in YARN's documentation (see Securing >> Long-lived YARN Services on the YARN Application Security page). But one >> of the goals of Kerberos integration is to support Kerberized data access >> for connectors other than HDFS, such as Kafka, Cassandra, and >> Elasticsearch. So your second point makes sense too, suggesting a >> general >> architecture for managing secrets (DTs, keytabs, certificates, oauth >> tokens, etc.) within the cluster. >> >> There's quite a few aspects to Flink security, including: >> 1. data access (e.g. how a connector authenticates to a data source) >> 2. service authorization and network security (e.g. how a Flink cluster >> protects itself from unauthorized access) >> 3. multi-user support (e.g. multi-user Flink clusters, RBAC) >> >> I mention these aspects to clarify your point about AuthN, which I took to >> be related to (1). Do tell if I misunderstood. >> >> Eron >> >> >> On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <suez1...@gmail.com> wrote: >> >> > Hi community, >> > >> > We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos >> > integration will require each container of a job to log in Kerberos via >> > keytab every say, 24 hours, and does not use any Hadoop delegation token >> > mechanism except when localizing the container. As I fixed the current >> > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more >> > features(FLINK-7860), I have some concern regarding the current >> > implementation. It can pose a scalability issue to the KDC, e.g., if >> YARN >> > cluster is restarted and all 10s of thousands of containers suddenly >> DDOS >> > KDC. >> > >> > I would like to propose to improve the current Flink-YARN-Kerberos >> > integration by doing something like the following: >> > 1) AppMaster (JobManager) periodically authenticate the KDC, get all >> > required DTs for the job. >> > 2) all other TM or TE containers periodically retrieve new DTs from the >> > AppMaster (either through a secure HDFS folder, or a secure Akka >> channel) >> > >> > Also, we want to extend Flink to support pluggable AuthN mechanism, >> because >> > we have our own internal AuthN mechanism. We would like add support in >> > Flink to authenticate periodically to our internal AuthN service as well >> > through, e.g., dynamic class loading, and use similar mechanism to >> > distribute the credential from the appMaster to containers. >> > >> > I would like to get comments and feedbacks. I can also write a design >> doc >> > or create a Flip if needed. Thanks a lot. >> > >> > Shuyi >> > >> > >> > >> > -- >> > "So you have to trust that the dots will somehow connect in your >> future." >> > >> > > > > -- > "So you have to trust that the dots will somehow connect in your future." > -- "So you have to trust that the dots will somehow connect in your future."