Re: Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Yun Gao
bos Send Date:Thu May 20 01:16:39 2021 Recipients:Yun Gao CC:user Subject:Re: Questions Flink DataStream in BATCH execution mode scalability advice > On May 19, 2021, at 7:26 AM, Yun Gao wrote: > > Hi Marco, > > For the remaining issues, > > 1. For the aggregation, th

Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Marco Villalobos
se > the timeout. > > Best, > Yun > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#heartbeat-timeout > > > --Original Mail -- > Sender:Marco Villalobos > Send Date:Wed May 19 14:03:48 2021 >

Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Yun Gao
14:03:48 2021 Recipients:user Subject:Questions Flink DataStream in BATCH execution mode scalability advice Questions Flink DataStream in BATCH execution mode scalability advice. Here is the problem that I am trying to solve. Input is an S3 bucket directory with about 500 GB of data across many

Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-18 Thread Marco Villalobos
Questions Flink DataStream in BATCH execution mode scalability advice. Here is the problem that I am trying to solve. Input is an S3 bucket directory with about 500 GB of data across many files. The instance that I am running on only has 50GB of EBS storage. The nature of this data is time