Re: Non parallel file sources

2020-06-24 Thread Arvid Heise
Another option if the file is small enough is to load it in the driver and directly initialize an in-memory source (env.fromElements). On Tue, Jun 23, 2020 at 9:57 PM Vishwas Siravara wrote: > Thanks that makes sense. > > On Tue, Jun 23, 2020 at 2:13 PM Laurent Exsteens < > laurent.exste...@eura

Re: Non parallel file sources

2020-06-23 Thread Vishwas Siravara
Thanks that makes sense. On Tue, Jun 23, 2020 at 2:13 PM Laurent Exsteens < laurent.exste...@euranova.eu> wrote: > Hi Nick, > > On a project I worked on, we simply made the file accessible on a shared > NFS drive. > Our source was custom, and we forced it to parallelism 1 inside the job, > so the

Re: Non parallel file sources

2020-06-23 Thread Laurent Exsteens
Hi Nick, On a project I worked on, we simply made the file accessible on a shared NFS drive. Our source was custom, and we forced it to parallelism 1 inside the job, so the file wouldn't be read multiple times. The rest of the job was distributed. This was also on a standalone cluster. On a resour

Non parallel file sources

2020-06-23 Thread Nick Bendtner
Hi guys, What is the best way to process a file from a unix file system since there is no guarantee as to which task manager will be assigned to process the file. We run flink in standalone mode. We currently follow the brute force way in which we copy the file to every task manager, is there a bet