Hi Chirag, There have been some issue with very large execution graphs. You might need to adjust the default configuration and configure larger Akka buffers and/or timeouts.
Also, 2000 sources means that you run at least 2000 threads at once. The FileInputFormat (and most of its sub-classes) in Flink 1.5.0 can be configured to accept multiple directories. This would be a preferred approach to creating one source per directory. Best, Fabian 2018-05-28 6:35 GMT+02:00 Chirag Dewan <chirag.dewa...@yahoo.in>: > Hi, > > I am working on a use case where my Flink job needs to collect data from > thousands of sources. > > As an example, I want to collect data from more than 2000 File > Directories, process(filter, transform) the data and distribute the > processed data streams to 200 different directories. > > Are there any caveats I should know with such large number of sources, > also taking into account per operator parallelism? > > Regards, > > Chirag > >