Flink + S3

Michael-Keith Bernard Mon, 18 Apr 2016 18:55:05 -0700

Hello Flink Users!

I'm a Flink newbie at the early stages of deploying our first Flink cluster 
into production and I have a few questions about wiring up Flink with S3:


* We are going to use the HA configuration[1] from day one (we have existing zk 
infrastructure already). Can S3 be used as a state backend for the Job Manager? 
The documentation talks about using S3 as a state backend for TM[2] (and in 
particular for streaming), but I'm wondering if it's a suitable backend for the 
JM as well.

* How do I configure S3 for Flink when I don't already have an existing Hadoop 
cluster? The documentation references the Hadoop configuration manifest[3], 
which kind of implies to me that I must already be running Hadoop (or at least 
have a properly configured Hadoop cluster). Is there an example somewhere of 
using S3 as a storage backend for a standalone cluster?

* Bonus: I'm writing a Puppet module for installing/configuring/managing Flink 
in stand alone mode with an existing zk cluster. Are there any existing modules 
for this (I didn't find anything in the forge)? Would others in the community 
be interested if we added our module to the forge once complete?

Thanks so much for your time and consideration. We look forward to using Flink 
in production!

Cheers,
Michael-Keith

[1]: 
https://ci.apache.org/projects/flink/flink-docs-master/setup/jobmanager_high_availability.html#standalone-cluster-high-availability

[2]: 
https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html#s3-simple-storage-service

[3]: 
https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html#set-s3-filesystem

Flink + S3

Reply via email to