Re: Appending to an hdfs file

2015-01-29 Thread Matan Safriel
e. > > On Wed, Jan 28, 2015 at 10:39 PM, Matan Safriel > wrote: > > Hi, > > > > Is it possible to append to an existing (hdfs) file, through some Spark > > action? > > Should there be any reason not to use a hadoop append api within a Spark > > job? > > > > Thanks, > > Matan > > >

Appending to an hdfs file

2015-01-28 Thread Matan Safriel
Hi, Is it possible to append to an existing (hdfs) file, through some Spark action? Should there be any reason not to use a hadoop append api within a Spark job? Thanks, Matan

Re: Running a task over a single input

2015-01-28 Thread Matan Safriel
gt; You can make an RDD of one object and invoke a distributed Spark > operation on it, but assuming you mean you have it on the driver, > that's wasteful. It just copies the object to another machine to > invoke the function. > > On Wed, Jan 28, 2015 at 10:14 AM, Matan Safriel

Running a task over a single input

2015-01-28 Thread Matan Safriel
Hi, How would I run a given function in Spark, over a single input object? Would I first add the input to the file system, then somehow invoke the Spark function on just that input? or should I rather twist the Spark streaming api for it? Assume I'd like to run a piece of computation that normall

Full per node replication level (architecture question)

2015-01-24 Thread Matan Safriel
Hi, I wonder whether any of the file systems supported by Spark, may well support a replication level whereby each node has a full copy of the data. I realize this was not the main intended scenario of spark/hadoop, but may be a good fit for a compute cluster that needs to be very fast over its in