pySpark saveAsSequenceFile append overwrite

2014-12-02 Thread Csaba Ragany
Dear Spark community, Has the pySpark saveAsSequenceFile() method the ability to append the new sequencefile into an other one or to overwrite an existing sequencefile? If the already exists then I get an error message... Thank You! Csaba

Re: pySpark - convert log/txt files into sequenceFile

2014-10-29 Thread Csaba Ragany
gt; > Cheers, > > Holden :) > > > On Tuesday, October 28, 2014, Csaba Ragany wrote: > >> Dear Spark Community, >> >> Is it possible to convert text files (.log or .txt files) into >> sequencefiles in Python? >> >> Using PySpark I can create a paralle

pySpark - convert log/txt files into sequenceFile

2014-10-28 Thread Csaba Ragany
Dear Spark Community, Is it possible to convert text files (.log or .txt files) into sequencefiles in Python? Using PySpark I can create a parallelized file with rdd=sc.parallelize([('key1', 1.0)]) and I can save it as a sequencefile with rdd.saveAsSequenceFile(). But how can I put the whole cont