Hi Sidhant, see the inline comments for answers
On Tue, Aug 11, 2020 at 3:10 PM sidhant gupta <sidhan...@gmail.com> wrote: > Hi Till, > > Thanks for your response. > I have few queries though as mentioned below: > (1) Can flink be used in map-reduce fashion with data streaming api ? > What do you understand as map-reduce fashion? You can use Flink's DataSet API for processing batch workloads (consisting not only of map and reduce operations but also other operations such as groupReduce, flatMap, etc.). Flink's DataStream API can be used to process bounded and unbounded streaming data. (2) Does it make sense to use aws EMR if we are not using flink in > map-reduce fashion with streaming api ? > I think I don't fully understand what you mean with map-reduce fashion. Do you mean multiple stages of map and reduce operations? > (3) Can flink cluster be auto scaled using EMR Managed Scaling when used > with yarn as per this link > https://aws.amazon.com/blogs/big-data/introducing-amazon-emr-managed-scaling-automatically-resize-clusters-to-lower-cost/ > ? > I am no expert on EMR managed scaling but I believe that it would need some custom tooling to scale a Flink job down (by taking a savepoint a resuming from it with a lower parallelism) before downsizing the EMR cluster. > (4) If we set an explicit max parallelism, and set current parallelism > (which might be less than the max parallelism) equal to the maximum number > of slots and set slots per task manager while starting the yarn session, > then if we increase the task manager as per auto scaling then does the > parallelism would increase (till the max parallelism ) and the load would > be distributed across the newly spined up task manager ? Refer: > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/production_ready.html#set-an-explicit-max-parallelism > > At the moment, Flink does not support this out of the box but the community is working on this feature. > > Regards > Sidhant Gupta > > On Tue, 11 Aug, 2020, 5:19 PM Till Rohrmann, <trohrm...@apache.org> wrote: > >> Hi Sidhant, >> >> I am not an expert on AWS services but I believe that EMR might be a bit >> easier to start with since AWS EMR comes with Flink support out of the box >> [1]. On ECS I believe that you would have to set up the containers >> yourself. Another interesting deployment option could be to use Flink's >> native Kubernetes integration [2] which would work on AWS EKS. >> >> [1] >> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/flink-create-cluster.html >> [2] >> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html >> >> Cheers, >> Till >> >> On Tue, Aug 11, 2020 at 9:16 AM sidhant gupta <sidhan...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> I'm kind of new to flink cluster deployment. I wanted to know which flink >>> cluster deployment and which job mode in aws is better in terms of ease >>> of >>> deployment, maintenance, HA, cost, etc. As of now I am considering aws >>> EMR >>> vs ECS (docker containers). We have a usecase of setting up a data >>> streaming api which reads records from a Kafka topic, process it and then >>> write to a another Kafka topic. Please let me know your thoughts on this. >>> >>> Thanks >>> Sidhant Gupta >>> >>