Re: pyspark and hdfs file name

Oleg Ruchovets Fri, 14 Nov 2014 00:15:07 -0800

Hi Devies.
Thank you for the quick answer.

I have a code like this:


....

sc = SparkContext(appName="TAD")
lines = sc.textFile(sys.argv[1], 1)
result = lines.map(doSplit).groupByKey().map(lambda (k,vc):
traffic_process_model(k,vc))
result.saveAsTextFile(sys.argv[2])


Can  you please give short example what should I do?

Also I found only saveAsTextFile. Does PySpark has saveAsBinary options or
what is the way to change text format output files?

Thanks
Oleg.

On Fri, Nov 14, 2014 at 3:26 PM, Davies Liu <dav...@databricks.com> wrote:

> One option maybe call HDFS tools or client to rename them after
> saveAsXXXFile().
>
> On Thu, Nov 13, 2014 at 9:39 PM, Oleg Ruchovets <oruchov...@gmail.com>
> wrote:
> > Hi ,
> >   I am running pyspark job.
> > I need serialize final result to hdfs in binary files and having ability
> to
> > give a name for output files.
> >
> > I found this post:
> >
> http://stackoverflow.com/questions/25293962/specifying-the-output-file-name-in-apache-spark
> >
> > but it explains how to do it using scala.
> >
> > Question:
> >  How to do it using pyspark
> >
> > Thanks
> > Oleg.
> >
>

Re: pyspark and hdfs file name

Reply via email to