Hi Marco,
I think Flink does not need 500GB for the source, the source should
be able to read from S3 in a streaming pattern (namely open the file,
create an input stream and fetch data as required).
But it might indeed need disk spaces for intermediate data
between operators and the sort operat
> On May 19, 2021, at 7:26 AM, Yun Gao wrote:
>
> Hi Marco,
>
> For the remaining issues,
>
> 1. For the aggregation, the 500GB of files are not required to be fit into
> memory.
> Rough speaking for the keyed().window().reduce(), the input records would be
> first
> sort according to the
Hi Marco,
For the remaining issues,
1. For the aggregation, the 500GB of files are not required to be fit into
memory.
Rough speaking for the keyed().window().reduce(), the input records would be
first
sort according to the key (time_series.name) via external sorts, which only
consumes
a fix