Jamie,

Thank you for your insight. To answer your questions I am running on AWS
and have access to S3. Further I already have Zookeeper in the mix (its
used by Mesos as well as Kafka). I was hoping to avoid the complexities of
an automated HA setup by running a single jvm and then migrate to HA down
the road. It sounds like I can't have my cake and eat it too (yet). =)

Ryan

On Fri, Jul 1, 2016 at 7:22 PM Jamie Grier <ja...@data-artisans.com> wrote:

> I started to answer these questions and then realized I was making an
> assumption about your environment.  Do you have a reliable persistent file
> system such as HDFS or S3 at your disposal or do you truly mean to run on a
> single node?
>
> If the you are truly thinking to run on a single node only there's no way
> to make this guaranteed to be reliable.  You would be open to machine and
> disk failures, etc.
>
> I think the minimal reasonable production setup must use at least 3
> physical nodes with the following services running:
>
> 1) HDFS or some other reliable filesystem (for persistent state storage)
> 2) Zookeeper for the Flink HA JobManager setup
>
> The rest is configuration..
>
> With regard to scaling up after your initial deployment:  right now in the
> latest Flink release (1.0.3) you cannot stop and restart a job with a
> different parallelism without losing your computed state.  What this means
> is that if you know you will likely scale up and you don't want to lose
> that state you can provision many, many slots on the TaskManagers you do
> run, essentially over-provisioning them, and run your job now with the max
> parallelism you expect to need to scale to.  This will all be much simpler
> to do in future Flink versions (though not in 1.1) but for now this would
> be a decent approach.
>
> In Flink versions after 1.1 Flink will be able to scale parallelism up and
> down while preserving all of the previously computed state.
>
> -Jamie
>
>
> On Fri, Jul 1, 2016 at 6:41 AM, Ryan Crumley <crum...@gmail.com> wrote:
>
>> Hi,
>>
>> I am evaluating flink for use in stateful streaming application. Some
>> information about the intended use:
>>
>>  - Will run in a mesos cluster and deployed via marathon in a docker
>> container
>>  - Initial throughput ~ 100 messages per second (from kafka)
>>  - Will need to scale to 10x that soon after launch
>>  - State will be much larger than memory available
>>
>> In order to quickly get this out the door I am considering postponing the
>> YARN / HA setup of a cluster with the idea that the current application can
>> easily fit within a single jvm and handle the throughput. Hopefully by the
>> time I need more scale flink support for mesos will be available and I can
>> use that to distribute the job to the cluster with minimal code rewrite.
>>
>> Questions:
>> 1. Is this a viable approach? Any pitfalls to be aware of?
>>
>> 2. What is the correct term for this deployment mode? Single node
>> standalone? Local?
>>
>> 3. Will the RocksDB state backend work in a single jvm mode?
>>
>> 4. When the single jvm process becomes unhealthy and is restarted by
>> marathon will flink recover appropriately or is failure recovery a function
>> of HA?
>>
>> 5. How would I migrate the RocksDB state once I move to HA mode? Is there
>> a straight forward path?
>>
>> Thanks for your time,
>>
>> Ryan
>>
>
>
>
> --
>
> Jamie Grier
> data Artisans, Director of Applications Engineering
> @jamiegrier <https://twitter.com/jamiegrier>
> ja...@data-artisans.com
>
>

Reply via email to