Re: Non parallel file sources

2020-06-24 Thread Arvid Heise
Another option if the file is small enough is to load it in the driver and directly initialize an in-memory source (env.fromElements). On Tue, Jun 23, 2020 at 9:57 PM Vishwas Siravara wrote: > Thanks that makes sense. > > On Tue, Jun 23, 2020 at 2:13 PM Laurent Exsteens < > laurent.exste...@eura

Re: Non parallel file sources

2020-06-23 Thread Vishwas Siravara
Thanks that makes sense. On Tue, Jun 23, 2020 at 2:13 PM Laurent Exsteens < laurent.exste...@euranova.eu> wrote: > Hi Nick, > > On a project I worked on, we simply made the file accessible on a shared > NFS drive. > Our source was custom, and we forced it to parallelism 1 inside the job, > so the

Re: Non parallel file sources

2020-06-23 Thread Laurent Exsteens
Hi Nick, On a project I worked on, we simply made the file accessible on a shared NFS drive. Our source was custom, and we forced it to parallelism 1 inside the job, so the file wouldn't be read multiple times. The rest of the job was distributed. This was also on a standalone cluster. On a resour