You can call any API you like in a Spark job, as long as the libraries are available, and Hadoop HDFS APIs will be available from the cluster. You could write a foreachPartition() that appends partitions of data to files, yes.
Spark itself does not use appending. I think the biggest reason is that RDDs are immutable and so their input and output is naturally immutable, not mutable. On Wed, Jan 28, 2015 at 10:39 PM, Matan Safriel <dev.ma...@gmail.com> wrote: > Hi, > > Is it possible to append to an existing (hdfs) file, through some Spark > action? > Should there be any reason not to use a hadoop append api within a Spark > job? > > Thanks, > Matan > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org