Hi Stefano,

I think the proposed feature is not limited to YARN sessions. With the code
in place, also standalone clusters would allow us to authenticate file
system access with the user who submitted the job.

I would recommend you to do some prototyping and come up with a design
document first. The change has quite some implications.
Some things that come into my mind:
- Filesystem implementations are currently instantiated once. I think we
would need to securely instantiate filesystems per user (imagine multiple
users on Flink) (That's why the YARN session user owns the file system /
Hbase access)
- There is currently no over-the-write encryption (its on my TODO list) in
Flink, so how do you transfer the security tokens? (YARN is currently doing
that for us. so we don't need to worry about that)

There are probably more implications you'll find while implementing this.


On Wed, Mar 23, 2016 at 7:06 PM, Maximilian Michels <m...@apache.org> wrote:

> Hi Stefano,
>
> Sounds great. Please go ahead! Note that Flink already provides the
> proposed feature for per-job Yarn clusters. However, it is a valuable
> addition to realize this feature for the Yarn session.
>
> The only blocker that I can think of is probably this PR which changes
> a lot of the Yarn classes: https://github.com/apache/flink/pull/1741
> There are also changes planned for the client side to decouple the
> Yarn support from the job submission process and make it easier to
> integrate other frameworks (like Mesos). I don't think that will block
> your contribution since a lot of the logic is probably going to be
> contained in separate classes which can be integrated even when code
> changes. Let's just stay in sync.
>
> If you like, you could start off by opening an issue and submitting a
> short design document.
>
> Cheers,
> Max
>
> On Wed, Mar 23, 2016 at 3:55 PM, Stefano Baghino
> <stefano.bagh...@radicalbit.io> wrote:
> > Hello everybody,
> >
> > some of us at Radicalbit spent the last few weeks experimenting to
> improve
> > the understanding of the compatibility of Flink with secure cluster
> > environments and with Kerberos in particular.
> >
> > We’ve found a possible area of improvement and would like to work on it
> as
> > part of our effort to contribute to Flink in the open: after a few tests
> > and a short exchange on the user mailing list
> > <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kerberos-on-YARN-delegation-or-proxying-tp5315p5318.html
> >
> > we’ve come to realize that currently a long-running session on YARN acts
> on
> > behalf of the user that originally ran the session, not the one who’s
> > submitting the job.
> >
> > We think it would be a nice improvement to be able to have a single Flink
> > session that keeps running with several users submitting their jobs with
> > their own credentials.
> >
> > We’d like to develop, test and document this improvement.
> >
> > Do you think this is feasible? Are there any blockers we should be aware
> of
> > before undertaking this task? Would this be something of interest for the
> > community? Are there any other ongoing efforts that aim toward this?
> >
> > We’d love to have the feedback of the community on this, thank you in
> > advance to anyone who’s willing to share their insight and opinion with
> us.
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
>

Reply via email to