Interact with different S3 buckets from a shared Flink cluster

Ricardo Cardante Wed, 17 Jun 2020 16:47:07 -0700

Hi!





We are working in a use case where we have a shared Flink cluster to deploy 
multiple jobs from different teams. With this strategy, we are facing a 
challenge regarding the interaction with S3. Given that we already configured 
S3 for the state backend (through flink-conf.yaml) every time we use API 
functions that communicate with the file system (e.g., DataStream readFile) the 
applicational configurations appear to be overridden by those of the cluster 
while attempting to communicate with external S3 buckets. What we've thought so 
far:




1. Provide a core-site.xml resource file targeting the external S3 buckets we 
want to interact with. We've tested, and the credentials ultimately seem to be 
ignored in behalf of the IAM roles that are pre-loaded with the instances;

2. Load the cluster instances with multiple IAM roles. The problem with this is 
that we would allow each job to interact with out-of-scope buckets;

3. Spin multiple clusters with different configurations - we would like to 
avoid this since we started from the premise of sharing a single cluster per 
context;




What would be a clean/recommended solution to interact with multiple S3 buckets 
with different security policies from a shared Flink cluster? 


Thanks in advance.

Interact with different S3 buckets from a shared Flink cluster

Reply via email to