To access local file, try with file:// URI. On Wed, Sep 7, 2016 at 8:52 AM, Peter Figliozzi <pete.figlio...@gmail.com> wrote:
> This is a great question. Basically you don't have to worry about the > details-- just give a wildcard in your call to textFile. See the Programming > Guide <http://spark.apache.org/docs/latest/programming-guide.html> section > entitled "External Datasets". The Spark framework will distribute your > data across the workers. Note that: > > *If using a path on the local filesystem, the file must also be accessible >> at the same path on worker nodes. Either copy the file to all workers or >> use a network-mounted shared file system.* > > > In your case this would mean the directory of files. > > Curiously, I cannot get this to work when I mount a directory with sshfs > on all of my worker nodes. It says "file not found" even though the file > clearly exists in the specified path on all workers. Anyone care to try > and comment on this? > > Thanks, > > Pete > > On Tue, Sep 6, 2016 at 9:51 AM, Lydia Ickler <ickle...@googlemail.com> > wrote: > >> Hi, >> >> maybe this is a stupid question: >> >> I have a list of files. Each file I want to take as an input for a >> ML-algorithm. All files are independent from another. >> My question now is how do I distribute the work so that each worker takes >> a block of files and just runs the algorithm on them one by one. >> I hope somebody can point me in the right direction! :) >> >> Best regards, >> Lydia >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > -- Best Regards, Ayan Guha