Hi Chirag,

There have been some issue with very large execution graphs.
You might need to adjust the default configuration and configure larger
Akka buffers and/or timeouts.

Also, 2000 sources means that you run at least 2000 threads at once.

The FileInputFormat (and most of its sub-classes) in Flink 1.5.0 can be
configured to accept multiple directories.
This would be a preferred approach to creating one source per directory.

Best, Fabian

2018-05-28 6:35 GMT+02:00 Chirag Dewan <chirag.dewa...@yahoo.in>:

> Hi,
>
> I am working on a use case where my Flink job needs to collect data from
> thousands of sources.
>
> As an example, I want to collect data from more than 2000 File
> Directories, process(filter, transform) the data and distribute the
> processed data streams to 200 different directories.
>
> Are there any caveats I should know with such large number of sources,
> also taking into account per operator parallelism?
>
> Regards,
>
> Chirag
>
>

Reply via email to