Hi CVP,

On how people use Flink, you can check this blogpost to see how Alibaba does it:
http://data-artisans.com/blink-flink-alibaba-search/ 
<http://data-artisans.com/blink-flink-alibaba-search/>

In addition, you can also find some more information on the matter on the talks 
from 
the last Flink Forwards conference: 
http://berlin.flink-forward.org/program/sessions/ 
<http://berlin.flink-forward.org/program/sessions/>

For example Netflix also shares some information here: 
http://berlin.flink-forward.org/kb_sessions/beaming-flink-to-the-cloud-netflix/ 
<http://berlin.flink-forward.org/kb_sessions/beaming-flink-to-the-cloud-netflix/>

Now for how things work under the hood, I will provide links to the Flink 
documentation. 
I hope that this will also help you figure out what fits your needs best:

For deployment and operations, the main resource is the Flink documentation, 
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/cluster_setup.html
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/cluster_setup.html>

and for what is about to come on that front, you can check out the FLIP-6 page:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077 
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077>

To dynamically scale your Flink job you have to take a savepoint and restart 
your job with different parallelism.
You can find some details here 
https://www.slideshare.net/tillrohrmann/dynamic-scaling-how-apache-flink-adapts-to-changing-workloads
 
<https://www.slideshare.net/tillrohrmann/dynamic-scaling-how-apache-flink-adapts-to-changing-workloads>
 , but unfortunately, this talk is a little bit outdated. We will update our 
documentation on dynamic scaling soon.

For the Resource allocation and Job Scheduling, you can check the links I 
included for deployment and operations,
and also this: 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.html
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.html>

For metrics and monitoring you can check here: 
https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/metrics.html
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/metrics.html>
and the related pages in the Debugging and monitoring section of the Flink 
documentation.

I hope this can help as a first step,
Kostas

> 
>     Right now our plan is to do a paper based study evaluating these options. 
>  
>     I'm sure lot of you guys in production/support would have encountered 
> issues around these. Can someone point out to blogs/research papers/material 
> focussing on the approach taken and the considerations for evaluation?
> 
>     Any help here is highly appreciated !
> 
> Best Regards
> CVP
>        

> On Feb 22, 2017, at 12:30 PM, Chakravarthy varaga <chakravarth...@gmail.com> 
> wrote:
> 
> Hi Team,
> 
>     We are analysing different deployment options for managing Flink Jobs on 
> AWS EC2 instances.
> 
>      Basically, the options (Resource Manangers) in front of us are using:
>      -> Standalone cluster
>      -> On YARN
>      -> Deploy using Mesos/Marthon
>      -> Deploy using Kubernetes/Docker
>      
>      The Resource Managers options are a bit confusing as we are unable to 
> decide on which one to go with. What we are looking at as inputs to our 
> analysis is:
>     ->  Dynamic Scaling of resources
>     ->  Resource Allocation
>     ->  Jobs Scheduling 
>     ->  No-Downtime upgrades
>     ->  Monitoring & Metrics.
> 
>     Right now our plan is to do a paper based study evaluating these options. 
>  
>     I'm sure lot of you guys in production/support would have encountered 
> issues around these. Can someone point out to blogs/research papers/material 
> focussing on the approach taken and the considerations for evaluation?
> 
>     Any help here is highly appreciated !
> 
> Best Regards
> CVP
>        

Reply via email to