Hi, I guess it depends what do you have already available in your cluster and try to use that. Running Flink in existing Yarn cluster is very easy, but setting up yarn cluster in the first place even if it’s easy (I’m not sure about if that’s the case), would add extra complexity.
When I’m spawning an AWS cluster for testing, I’m using EMR with Yarn included and I think that’s very easy to do, as everything works out of the box. I’ve heard that Kubernetes/Docker are just as easy. I’m also not a dev ops, but I’ve heard that my colleagues, if have any preferences, they usually prefer Kubernetes. > Have in mind that I need to run the job with > ExecutionEnvironment.createRemoteEnvironment(), to upload a jar is not a > valid option for me, it seems to me that not all the options support remote > submission of jobs, but I'm not sure > I think all of them support should support remote environment. Almost for sure Standalone, Yarn, Kubernetes and Docker do. Piotrek > On 28 Feb 2020, at 10:25, Antonio Martínez Carratalá > <amarti...@alto-analytics.com> wrote: > > Hello > I'm working on a project with Flink 1.8. I'm running my code from Java in a > remote Flink as described here > https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/cluster_execution.html > > <https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/cluster_execution.html> > . That part is working, but I want to configure a dynamic Flink cluster to > execute the jobs > > Imagine I have users that sometimes need to run a report, this report is > generated with data processed in Flink, whenever a user requests a report I > have to submit a job to a remote Flink cluster, this job execution is heavy > and may require 1 hour to finish > > So, I don't want to have 3, 4, 5... Task Managers always running in the > cluster, some times they are idle and other times I don't have enough Task > Managers for all the requests, I want to dynamically create Task Managers as > the jobs are received at the Job Manager, and get rid of them at the end > > I see a lot of options to create a cluster in > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ > <https://ci.apache.org/projects/flink/flink-docs-release-1.8/> section > [Deployment & Operations] [Clusters & Deployment] like Standalone, YARN, > Mesos, Docker, Kubernetes... but I don't know what would be the most suitable > for my case of use, I'm not an expert in devops and I barely know about these > technologies > > Some advice on which technology to use, and maybe some examples, would be > really appreciated > > Have in mind that I need to run the job with > ExecutionEnvironment.createRemoteEnvironment(), to upload a jar is not a > valid option for me, it seems to me that not all the options support remote > submission of jobs, but I'm not sure > > Thank you > > Antonio Martinez > >