https://github.com/apache/flink/pull/2317

On Mon, Aug 1, 2016 at 11:54 AM, Niels Basjes <ni...@basjes.nl> wrote:

> Thanks for the pointers towards the work you are doing here.
> I'll put up a patch for the jars and such in the next few days.
> https://issues.apache.org/jira/browse/FLINK-4287
>
> Niels Basjes
>
> On Mon, Aug 1, 2016 at 11:46 AM, Stephan Ewen <se...@apache.org> wrote:
>
>> Thank you for the breakdown of the problem.
>>
>> Option (1) or (2) would be the way to go, currently.
>>
>> The problem that (3) does not support HBase is simply solvable by adding
>> the HBase jars to the lib directory. In the future, this should be solved
>> by the YARN re-architecturing:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
>>
>> For the renewal of Kerberos tokens for streaming jobs: There is WIP and a
>> pull request to attach key tabs to a Flink job:
>> https://github.com/apache/flink/pull/2275
>>
>> The problem that the YARN session is accessible by everyone is a bit more
>> tricky. In the future, this should be solved by these two parts:
>>   - With the YARN re-achitecturing, sessions are bound to individual
>> users. It should be possible to launch the session out of a single
>> YarnExecutionEnvironment and then submit multiple jobs against it.
>>   - The over-the-wire encryption and authentication should make sure that
>> no other user can send jobs to that session.
>>
>> Greetings,
>> Stephan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Aug 1, 2016 at 9:47 AM, Niels Basjes <ni...@basjes.nl> wrote:
>>
>>> Hi,
>>>
>>> I have the situation that I have a Kerberos secured Yarn/HBase
>>> installation and I want to export data from a lot (~200) HBase tables to
>>> files on HDFS.
>>> I wrote a flink job that does this exactly the way I want it for a
>>> single table.
>>>
>>> Now in general I have a few possible approaches to do this for the 200
>>> tables I am facing:
>>>
>>> 1) Create a single job that reads the data from all of those tables and
>>> writes them to the correct files.
>>>     I expect that to be a monster that will hog the entire cluster
>>> because of the large number of HBase regions.
>>>
>>> 2) Run a job that does this for a single table and simply run that in a
>>> loop.
>>>     Essentially I would have a shellscript or 'main' that loops over all
>>> tablenames and run a flink job for each of those.
>>>     The downside of this is that it will start a new flink topology on
>>> Yarn for each table.
>>>     This has a startup overhead of something like 30 seconds for each
>>> table that I would like to avoid.
>>>
>>> 3) I start a single    yarn-session   and submit my job in there 200
>>> times.
>>>     That would solve most of the startup overhead yet this doesn't work.
>>>
>>> If I start yarn-session then I see these two relevant lines in the
>>> output.
>>>
>>> 2016-07-29 14:58:30,575 INFO  org.apache.flink.yarn.Utils
>>>                     - Attempting to obtain Kerberos security token for HBase
>>> 2016-07-29 14:58:30,576 INFO  org.apache.flink.yarn.Utils
>>>                     - HBase is not available (not packaged with this
>>> application): ClassNotFoundException :
>>> "org.apache.hadoop.hbase.HBaseConfiguration".
>>>
>>> As a consequence any flink job I submit cannot access HBase at all.
>>>
>>> As an experiment I changed my yarn-session.sh script to include HBase on
>>> the classpath. (If you want I can submit a Jira issue and a pull request)
>>> Now the yarn-session does have HBase available and the jobs runs as
>>> expected.
>>>
>>> There are how ever two problems that remain:
>>> 1) This yarnsession is accessible by everyone on the cluster and as a
>>> consequence they can run jobs in there that can access all data I have
>>> access to.
>>> 2) The kerberos token will expire after a while and (just like with all
>>> long running jobs) I would really like to have this to be a 'long lived'
>>> thing.
>>>
>>> As far as I know this is just the tip of the security ice berg and I
>>> would like to know what the correct approach is to solve this.
>>>
>>> Thanks.
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to