Re: The difference between pyspark.rdd.PipelinedRDD and pyspark.rdd.RDD

edmond_huo Wed, 17 Sep 2014 07:28:29 -0700

Hi Davis, 

Thank you for you answer. This is my code. I think it is very similar with
word count example in spark


  lines = sc.textFile(sys.argv[2])
  sie = lines.map(lambda l: (l.strip().split(',')[4],1)).reduceByKey(lambda
a, b: a + b)
  sort_sie = sie.sortByKey(False)

Thanks again.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-difference-between-pyspark-rdd-PipelinedRDD-and-pyspark-rdd-RDD-tp14421p14448.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: The difference between pyspark.rdd.PipelinedRDD and pyspark.rdd.RDD

Reply via email to