Thanks. I actually looked up foreachPartition() in this context yesterday,
and couldn't land where it's documented in Javadocs or elsewhere.. probably
for some silly reason. Can you please point me in the right direction?
Many thanks!
By the way, I realize the solution should rather be to concate
You can call any API you like in a Spark job, as long as the libraries
are available, and Hadoop HDFS APIs will be available from the
cluster. You could write a foreachPartition() that appends partitions
of data to files, yes.
Spark itself does not use appending. I think the biggest reason is
that