On Fri, Nov 14, 2014 at 12:14 AM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > Hi Devies. > Thank you for the quick answer. > > I have a code like this: > > .... > > sc = SparkContext(appName="TAD") > lines = sc.textFile(sys.argv[1], 1) > result = lines.map(doSplit).groupByKey().map(lambda (k,vc): > traffic_process_model(k,vc)) > result.saveAsTextFile(sys.argv[2]) > > > Can you please give short example what should I do? > > Also I found only saveAsTextFile. Does PySpark has saveAsBinary options or > what is the way to change text format output files?
You can use saveAsPickleFile() [1], you could use the following line to rename (it's slow): >>> os.system( "hadoop fs -mv URI [URI …] <dest>") Just found that there is a pure python client for HDFS [2] (not verified). [1] http://spark.apache.org/docs/latest/api/python/pyspark.rdd.RDD-class.html#saveAsPickleFile [2] https://labs.spotify.com/2013/05/07/snakebite/ > Thanks > Oleg. > > On Fri, Nov 14, 2014 at 3:26 PM, Davies Liu <dav...@databricks.com> wrote: >> >> One option maybe call HDFS tools or client to rename them after >> saveAsXXXFile(). >> >> On Thu, Nov 13, 2014 at 9:39 PM, Oleg Ruchovets <oruchov...@gmail.com> >> wrote: >> > Hi , >> > I am running pyspark job. >> > I need serialize final result to hdfs in binary files and having ability >> > to >> > give a name for output files. >> > >> > I found this post: >> > >> > http://stackoverflow.com/questions/25293962/specifying-the-output-file-name-in-apache-spark >> > >> > but it explains how to do it using scala. >> > >> > Question: >> > How to do it using pyspark >> > >> > Thanks >> > Oleg. >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org