In my opinion the streaming process can be perfectly simulated on a
single node. You can setup a message distribution system like Kafka on a
single node, you can run Spark on a single node and the only thing you
need to change when running it on a cluster is that you need to change
the environment. So there is no need to setup a cluster when testing the
streaming process.
Regards,
Kevin
On 30-06-16 09:54, Longda Feng wrote:
This means Standalone mode is just for prototype.
But I think we need a lightweight solution for streaming process, standalone is
the best solution. Some times, we need setup a flink cluster on a small
cluster. setup a yarn cluster isn't convenient.
(1) in small company, the number of machine is small (2) When a data center is
small, but we still need do some computing in this data center(3) some machines
are in the while-list, they have been authorited to access some special data or
machine, but the number of these machine is small.(4) some machine has critical
data, they can't be shared with others, but the number of these machine is
small.(5) when a team start to learn flink, he will setup a small cluster
firstly, maybe he wo't want to setup a huge system, perfer to a small system
regardsLongda
------------------------------------------------------------------From:Aljoscha Krettek
<aljos...@apache.org>Send Time:2016年6月29日(星期三) 21:48To:封仲淹(纪君祥)
<zhongyan.f...@alibaba-inc.com>; dev <dev@flink.apache.org>Subject:Re: [Discuss] Why
different job's tasks can run in the single process.
Hi,
yes, you are definitely right that allowing to run multiple user code tasks
in the same TaskManager JVM is not good for stability. This mode is still
there from the very early days of Flink where Yarn was not yet available.
In a production environment I would now recommend to always run one
Flink-Yarn cluster per job to get good isolation between different jobs.
Cheers,
Aljoscha
On Wed, 29 Jun 2016 at 09:18 Longda Feng <zhongyan.f...@alibaba-inc.com>
wrote:
hi ,
Sorry for asking the quest here? Any answer will be apprecated.
Why different job's tasks can run in the single process. (There are some
different job's tasks in one TaskManager).It seems Flink-on-Yarn can let
different job run on different process. But for standalone mode, this
problem still exists.
Why design Flink like this?The advantage What I can thought is as
following:(1) All task can share bigger memory pool.(2) The communication
between the tasks in the same process will be fast.
But this design will impact to the stability. Flink provide
User-Define-Function interface, if one of the User-Define-Function crash,
It maybe crack the whole JVM, If the TaskManager crash, all other job's
task in this TaskManager will be impacted. Even if the JVM don't crash, but
maybe lead to some other unexpected problem, what's more this will make the
code too sophisticated。Normal framework like Spark/Storm/Samza won't run
different job's tasks in the same process。As one normal user, stability has
the highest priority.
ThanksLongda