You can call any API you like in a Spark job, as long as the libraries
are available, and Hadoop HDFS APIs will be available from the
cluster. You could write a foreachPartition() that appends partitions
of data to files, yes.

Spark itself does not use appending. I think the biggest reason is
that RDDs are immutable and so their input and output is naturally
immutable, not mutable.

On Wed, Jan 28, 2015 at 10:39 PM, Matan Safriel <dev.ma...@gmail.com> wrote:
> Hi,
>
> Is it possible to append to an existing (hdfs) file, through some Spark
> action?
> Should there be any reason not to use a hadoop append api within a Spark
> job?
>
> Thanks,
> Matan
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to