Hi Davis, Thank you for you answer. This is my code. I think it is very similar with word count example in spark
lines = sc.textFile(sys.argv[2]) sie = lines.map(lambda l: (l.strip().split(',')[4],1)).reduceByKey(lambda a, b: a + b) sort_sie = sie.sortByKey(False) Thanks again. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-difference-between-pyspark-rdd-PipelinedRDD-and-pyspark-rdd-RDD-tp14421p14448.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org