Jamie, Thank you for your insight. To answer your questions I am running on AWS and have access to S3. Further I already have Zookeeper in the mix (its used by Mesos as well as Kafka). I was hoping to avoid the complexities of an automated HA setup by running a single jvm and then migrate to HA down the road. It sounds like I can't have my cake and eat it too (yet). =)
Ryan On Fri, Jul 1, 2016 at 7:22 PM Jamie Grier <ja...@data-artisans.com> wrote: > I started to answer these questions and then realized I was making an > assumption about your environment. Do you have a reliable persistent file > system such as HDFS or S3 at your disposal or do you truly mean to run on a > single node? > > If the you are truly thinking to run on a single node only there's no way > to make this guaranteed to be reliable. You would be open to machine and > disk failures, etc. > > I think the minimal reasonable production setup must use at least 3 > physical nodes with the following services running: > > 1) HDFS or some other reliable filesystem (for persistent state storage) > 2) Zookeeper for the Flink HA JobManager setup > > The rest is configuration.. > > With regard to scaling up after your initial deployment: right now in the > latest Flink release (1.0.3) you cannot stop and restart a job with a > different parallelism without losing your computed state. What this means > is that if you know you will likely scale up and you don't want to lose > that state you can provision many, many slots on the TaskManagers you do > run, essentially over-provisioning them, and run your job now with the max > parallelism you expect to need to scale to. This will all be much simpler > to do in future Flink versions (though not in 1.1) but for now this would > be a decent approach. > > In Flink versions after 1.1 Flink will be able to scale parallelism up and > down while preserving all of the previously computed state. > > -Jamie > > > On Fri, Jul 1, 2016 at 6:41 AM, Ryan Crumley <crum...@gmail.com> wrote: > >> Hi, >> >> I am evaluating flink for use in stateful streaming application. Some >> information about the intended use: >> >> - Will run in a mesos cluster and deployed via marathon in a docker >> container >> - Initial throughput ~ 100 messages per second (from kafka) >> - Will need to scale to 10x that soon after launch >> - State will be much larger than memory available >> >> In order to quickly get this out the door I am considering postponing the >> YARN / HA setup of a cluster with the idea that the current application can >> easily fit within a single jvm and handle the throughput. Hopefully by the >> time I need more scale flink support for mesos will be available and I can >> use that to distribute the job to the cluster with minimal code rewrite. >> >> Questions: >> 1. Is this a viable approach? Any pitfalls to be aware of? >> >> 2. What is the correct term for this deployment mode? Single node >> standalone? Local? >> >> 3. Will the RocksDB state backend work in a single jvm mode? >> >> 4. When the single jvm process becomes unhealthy and is restarted by >> marathon will flink recover appropriately or is failure recovery a function >> of HA? >> >> 5. How would I migrate the RocksDB state once I move to HA mode? Is there >> a straight forward path? >> >> Thanks for your time, >> >> Ryan >> > > > > -- > > Jamie Grier > data Artisans, Director of Applications Engineering > @jamiegrier <https://twitter.com/jamiegrier> > ja...@data-artisans.com > >