Yes, this is what I also found in Spark documentation, that foreach can have side effects. Nevertheless I have this weird error, that sometimes files are just empty.
"using" is simply a wrapper that takes our code, makes try-catch-finally and flush & close all resources. I honestly have no clue what can possibly be wrong. No errors in logs. On Thu, Dec 11, 2014 at 2:29 PM, Daniel Darabos < [email protected]> wrote: > > Yes, this is perfectly "legal". This is what RDD.foreach() is for! You may > be encountering an IO exception while writing, and maybe using() suppresses > it. (?) I'd try writing the files with java.nio.file.Files.write() -- I'd > expect there is less that can go wrong with that simple call. > > On Thu, Dec 11, 2014 at 12:50 PM, Paweł Szulc <[email protected]> > wrote: > >> Imagine simple Spark job, that will store each line of the RDD to a >> separate file >> >> >> val lines = sc.parallelize(1 to 100).map(n => s"this is line $n") >> lines.foreach(line => writeToFile(line)) >> >> def writeToFile(line: String) = { >> def filePath = "file://..." >> val file = new File(new URI(path).getPath) >> // using function simply closes the output stream >> using(new FileOutputStream(file)) { output => >> output.write(value) >> } >> } >> >> >> Now, example above works 99,9% of a time. Files are generated for each >> line, each file contains that particular line. >> >> However, when dealing with large number of data, we encounter situations >> where some of the files are empty! Files are generated, but there is no >> content inside of them (0 bytes). >> >> Now the question is: can Spark job have side effects. Is it even legal to >> write such code? >> If no, then what other choice do we have when we want to save data from >> our RDD? >> If yes, then do you guys see what could be the reason of this job acting >> in this strange manner 0.1% of the time? >> >> >> disclaimer: we are fully aware of .saveAsTextFile method in the API, >> however the example above is a simplification of our code - normally we >> produce PDF files. >> >> >> Best regards, >> Paweł Szulc >> >> >> >> >> >> >> >
