You can try something like this, val kvRdd = sc.textFile("rawdata/").map( m => { val pfUser = m.split("t",2) (pfUser(0) -> pfUser(1))}) .partitionBy(new org.apache.spark.HashPartitioner(8))
You have a kvRdd with pageName as Key and UserID as Value. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ways-to-partition-the-RDD-tp12083p12119.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org