I started to answer these questions and then realized I was making an
assumption about your environment.  Do you have a reliable persistent file
system such as HDFS or S3 at your disposal or do you truly mean to run on a
single node?

If the you are truly thinking to run on a single node only there's no way
to make this guaranteed to be reliable.  You would be open to machine and
disk failures, etc.

I think the minimal reasonable production setup must use at least 3
physical nodes with the following services running:

1) HDFS or some other reliable filesystem (for persistent state storage)
2) Zookeeper for the Flink HA JobManager setup

The rest is configuration..

With regard to scaling up after your initial deployment:  right now in the
latest Flink release (1.0.3) you cannot stop and restart a job with a
different parallelism without losing your computed state.  What this means
is that if you know you will likely scale up and you don't want to lose
that state you can provision many, many slots on the TaskManagers you do
run, essentially over-provisioning them, and run your job now with the max
parallelism you expect to need to scale to.  This will all be much simpler
to do in future Flink versions (though not in 1.1) but for now this would
be a decent approach.

In Flink versions after 1.1 Flink will be able to scale parallelism up and
down while preserving all of the previously computed state.

-Jamie


On Fri, Jul 1, 2016 at 6:41 AM, Ryan Crumley <crum...@gmail.com> wrote:

> Hi,
>
> I am evaluating flink for use in stateful streaming application. Some
> information about the intended use:
>
>  - Will run in a mesos cluster and deployed via marathon in a docker
> container
>  - Initial throughput ~ 100 messages per second (from kafka)
>  - Will need to scale to 10x that soon after launch
>  - State will be much larger than memory available
>
> In order to quickly get this out the door I am considering postponing the
> YARN / HA setup of a cluster with the idea that the current application can
> easily fit within a single jvm and handle the throughput. Hopefully by the
> time I need more scale flink support for mesos will be available and I can
> use that to distribute the job to the cluster with minimal code rewrite.
>
> Questions:
> 1. Is this a viable approach? Any pitfalls to be aware of?
>
> 2. What is the correct term for this deployment mode? Single node
> standalone? Local?
>
> 3. Will the RocksDB state backend work in a single jvm mode?
>
> 4. When the single jvm process becomes unhealthy and is restarted by
> marathon will flink recover appropriately or is failure recovery a function
> of HA?
>
> 5. How would I migrate the RocksDB state once I move to HA mode? Is there
> a straight forward path?
>
> Thanks for your time,
>
> Ryan
>



-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com

Reply via email to