Re: pyspark and hdfs file name

2014-11-14 Thread Davies Liu
On Fri, Nov 14, 2014 at 12:14 AM, Oleg Ruchovets wrote: > Hi Devies. > Thank you for the quick answer. > > I have a code like this: > > > > sc = SparkContext(appName="TAD") > lines = sc.textFile(sys.argv[1], 1) > result = lines.map(doSplit).groupByKey().map(lambda (k,vc): > traffic_process_mo

Re: pyspark and hdfs file name

2014-11-14 Thread Oleg Ruchovets
Hi Devies. Thank you for the quick answer. I have a code like this: sc = SparkContext(appName="TAD") lines = sc.textFile(sys.argv[1], 1) result = lines.map(doSplit).groupByKey().map(lambda (k,vc): traffic_process_model(k,vc)) result.saveAsTextFile(sys.argv[2]) Can you please give short e

Re: pyspark and hdfs file name

2014-11-13 Thread Davies Liu
One option maybe call HDFS tools or client to rename them after saveAsXXXFile(). On Thu, Nov 13, 2014 at 9:39 PM, Oleg Ruchovets wrote: > Hi , > I am running pyspark job. > I need serialize final result to hdfs in binary files and having ability to > give a name for output files. > > I found th

pyspark and hdfs file name

2014-11-13 Thread Oleg Ruchovets
Hi , I am running pyspark job. I need serialize final result to *hdfs in binary files* and having ability to give a *name for output files*. I found this post: http://stackoverflow.com/questions/25293962/specifying-the-output-file-name-in-apache-spark but it explains how to do it using scala.