Hi, I have the situation that I have a Kerberos secured Yarn/HBase installation and I want to export data from a lot (~200) HBase tables to files on HDFS. I wrote a flink job that does this exactly the way I want it for a single table.
Now in general I have a few possible approaches to do this for the 200 tables I am facing: 1) Create a single job that reads the data from all of those tables and writes them to the correct files. I expect that to be a monster that will hog the entire cluster because of the large number of HBase regions. 2) Run a job that does this for a single table and simply run that in a loop. Essentially I would have a shellscript or 'main' that loops over all tablenames and run a flink job for each of those. The downside of this is that it will start a new flink topology on Yarn for each table. This has a startup overhead of something like 30 seconds for each table that I would like to avoid. 3) I start a single yarn-session and submit my job in there 200 times. That would solve most of the startup overhead yet this doesn't work. If I start yarn-session then I see these two relevant lines in the output. 2016-07-29 14:58:30,575 INFO org.apache.flink.yarn.Utils - Attempting to obtain Kerberos security token for HBase 2016-07-29 14:58:30,576 INFO org.apache.flink.yarn.Utils - HBase is not available (not packaged with this application): ClassNotFoundException : "org.apache.hadoop.hbase.HBaseConfiguration". As a consequence any flink job I submit cannot access HBase at all. As an experiment I changed my yarn-session.sh script to include HBase on the classpath. (If you want I can submit a Jira issue and a pull request) Now the yarn-session does have HBase available and the jobs runs as expected. There are how ever two problems that remain: 1) This yarnsession is accessible by everyone on the cluster and as a consequence they can run jobs in there that can access all data I have access to. 2) The kerberos token will expire after a while and (just like with all long running jobs) I would really like to have this to be a 'long lived' thing. As far as I know this is just the tip of the security ice berg and I would like to know what the correct approach is to solve this. Thanks. -- Best regards / Met vriendelijke groeten, Niels Basjes