https://github.com/apache/flink/pull/2317
On Mon, Aug 1, 2016 at 11:54 AM, Niels Basjes <ni...@basjes.nl> wrote: > Thanks for the pointers towards the work you are doing here. > I'll put up a patch for the jars and such in the next few days. > https://issues.apache.org/jira/browse/FLINK-4287 > > Niels Basjes > > On Mon, Aug 1, 2016 at 11:46 AM, Stephan Ewen <se...@apache.org> wrote: > >> Thank you for the breakdown of the problem. >> >> Option (1) or (2) would be the way to go, currently. >> >> The problem that (3) does not support HBase is simply solvable by adding >> the HBase jars to the lib directory. In the future, this should be solved >> by the YARN re-architecturing: >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077 >> >> For the renewal of Kerberos tokens for streaming jobs: There is WIP and a >> pull request to attach key tabs to a Flink job: >> https://github.com/apache/flink/pull/2275 >> >> The problem that the YARN session is accessible by everyone is a bit more >> tricky. In the future, this should be solved by these two parts: >> - With the YARN re-achitecturing, sessions are bound to individual >> users. It should be possible to launch the session out of a single >> YarnExecutionEnvironment and then submit multiple jobs against it. >> - The over-the-wire encryption and authentication should make sure that >> no other user can send jobs to that session. >> >> Greetings, >> Stephan >> >> >> >> >> >> >> >> >> >> On Mon, Aug 1, 2016 at 9:47 AM, Niels Basjes <ni...@basjes.nl> wrote: >> >>> Hi, >>> >>> I have the situation that I have a Kerberos secured Yarn/HBase >>> installation and I want to export data from a lot (~200) HBase tables to >>> files on HDFS. >>> I wrote a flink job that does this exactly the way I want it for a >>> single table. >>> >>> Now in general I have a few possible approaches to do this for the 200 >>> tables I am facing: >>> >>> 1) Create a single job that reads the data from all of those tables and >>> writes them to the correct files. >>> I expect that to be a monster that will hog the entire cluster >>> because of the large number of HBase regions. >>> >>> 2) Run a job that does this for a single table and simply run that in a >>> loop. >>> Essentially I would have a shellscript or 'main' that loops over all >>> tablenames and run a flink job for each of those. >>> The downside of this is that it will start a new flink topology on >>> Yarn for each table. >>> This has a startup overhead of something like 30 seconds for each >>> table that I would like to avoid. >>> >>> 3) I start a single yarn-session and submit my job in there 200 >>> times. >>> That would solve most of the startup overhead yet this doesn't work. >>> >>> If I start yarn-session then I see these two relevant lines in the >>> output. >>> >>> 2016-07-29 14:58:30,575 INFO org.apache.flink.yarn.Utils >>> - Attempting to obtain Kerberos security token for HBase >>> 2016-07-29 14:58:30,576 INFO org.apache.flink.yarn.Utils >>> - HBase is not available (not packaged with this >>> application): ClassNotFoundException : >>> "org.apache.hadoop.hbase.HBaseConfiguration". >>> >>> As a consequence any flink job I submit cannot access HBase at all. >>> >>> As an experiment I changed my yarn-session.sh script to include HBase on >>> the classpath. (If you want I can submit a Jira issue and a pull request) >>> Now the yarn-session does have HBase available and the jobs runs as >>> expected. >>> >>> There are how ever two problems that remain: >>> 1) This yarnsession is accessible by everyone on the cluster and as a >>> consequence they can run jobs in there that can access all data I have >>> access to. >>> 2) The kerberos token will expire after a while and (just like with all >>> long running jobs) I would really like to have this to be a 'long lived' >>> thing. >>> >>> As far as I know this is just the tip of the security ice berg and I >>> would like to know what the correct approach is to solve this. >>> >>> Thanks. >>> >>> -- >>> Best regards / Met vriendelijke groeten, >>> >>> Niels Basjes >>> >> >> > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes > -- Best regards / Met vriendelijke groeten, Niels Basjes