What error do you get? FileNotFoundException?
Please paste the stacktrace here. Yong ________________________________ From: Peter Figliozzi <pete.figlio...@gmail.com> Sent: Wednesday, September 7, 2016 10:18 AM To: ayan guha Cc: Lydia Ickler; user.spark Subject: Re: distribute work (files) That's failing for me. Can someone please try this-- is this even supposed to work: * create a directory somewhere and add two text files to it * mount that directory on the Spark worker machines with sshfs * read the textfiles into one datas structure using a file URL with a wildcard Thanks, Pete On Tue, Sep 6, 2016 at 11:20 PM, ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>> wrote: To access local file, try with file:// URI. On Wed, Sep 7, 2016 at 8:52 AM, Peter Figliozzi <pete.figlio...@gmail.com<mailto:pete.figlio...@gmail.com>> wrote: This is a great question. Basically you don't have to worry about the details-- just give a wildcard in your call to textFile. See the Programming Guide<http://spark.apache.org/docs/latest/programming-guide.html> section entitled "External Datasets". The Spark framework will distribute your data across the workers. Note that: If using a path on the local filesystem, the file must also be accessible at the same path on worker nodes. Either copy the file to all workers or use a network-mounted shared file system. In your case this would mean the directory of files. Curiously, I cannot get this to work when I mount a directory with sshfs on all of my worker nodes. It says "file not found" even though the file clearly exists in the specified path on all workers. Anyone care to try and comment on this? Thanks, Pete On Tue, Sep 6, 2016 at 9:51 AM, Lydia Ickler <ickle...@googlemail.com<mailto:ickle...@googlemail.com>> wrote: Hi, maybe this is a stupid question: I have a list of files. Each file I want to take as an input for a ML-algorithm. All files are independent from another. My question now is how do I distribute the work so that each worker takes a block of files and just runs the algorithm on them one by one. I hope somebody can point me in the right direction! :) Best regards, Lydia --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> -- Best Regards, Ayan Guha