I favor the one-cluster-per job approach. If this becomes the dominant approach to doing things we could also think about introducing a separate component that would allow monitoring the jobs in these per-job clusters as is now possible when running multiple jobs in a single cluster.
On Thu, 12 May 2016 at 01:59 Wright, Eron <ewri...@live.com> wrote: > One option is to use a separate cluster (JobManager + TaskManagers) for > each job. This is fairly straightforward with the YARN support - "flink > run” can launch a cluster for a job and tear it down afterwards. > > Of course this means you must deploy YARN. That doesn’t necessarily > imply HDFS though a Hadoop-compatible filesystem (HCFS) is needed to > support the YARN staging directory. > > This approach also facilitates richer scheduling and multi-user scenarios. > > One downside is the loss of a unified web UI to view all jobs. > > > > On May 11, 2016, at 8:32 AM, Jark Wu <wuchong...@alibaba-inc.com> wrote: > > > > > > As I know, Flink uses thread model, that means one TaskManager process > may run many different operator threads from different jobs. So tasks from > different jobs will compete for memory and CPU in the one process. In the > worst case scenario, the bad job will eat most of CPU and memroy which may > lead to OOM, and then the regular job died too. And there's another > problem, tasks from different jobs will print there logs into the same > file(the taskmanager log file). This increases the difficulty of debugging. > > > > As I know, Storm will spawn workers for every job. The tasks in one > worker belong to the same job. So I'm confused the purpose or advantages of > Flink design. One more question, is there any tips to solves the issues > above? Or any suggestions to implemention the similar desgin with Storm ? > > > > Thank you for any answers in advance! > > > > Regards, > > Jark Wu > > > > > > > >