It worked! I was struggling for a week. Thanks a lot!
On Mon, Jul 14, 2014 at 12:31 PM, Xiangrui Meng [via Apache Spark User List] <ml-node+s1001560n9591...@n3.nabble.com> wrote: > You should return an iterator in mapPartitionsWIthIndex. This is from > the programming guide > (http://spark.apache.org/docs/latest/programming-guide.html): > > mapPartitionsWithIndex(func): Similar to mapPartitions, but also > provides func with an integer value representing the index of the > partition, so func must be of type (Int, Iterator<T>) => Iterator<U> > when running on an RDD of type T. > > For your case, try something similar to the following: > > val keyval=dRDD.mapPartitionsWithIndex { (ind,iter) => > iter.map(x => process(ind,x.trim().split(' ').map(_.toDouble),q,m,r)) > } > > -Xiangrui > > On Sun, Jul 13, 2014 at 11:26 PM, Madhura <[hidden email] > <http://user/SendEmail.jtp?type=node&node=9591&i=0>> wrote: > > > I have a text file consisting of a large number of random floating > values > > separated by spaces. I am loading this file into a RDD in scala. > > > > I have heard of mapPartitionsWithIndex but I haven't been able to > implement > > it. For each partition I want to call a method(process in this case) to > > which I want to pass the partition and it's respective index as > parameters. > > > > My method returns a pair of values. > > This is what I have done. > > > > val dRDD = sc.textFile("hdfs://master:54310/Data/input*") > > var ind:Int=0 > > val keyval= dRDD.mapPartitionsWithIndex((ind,x) => process(ind,x,...)) > > val res=keyval.collect() > > > > We are not able to access res(0)._1 and res(0)._2 > > > > The error log is as follows. > > > > [error] SimpleApp.scala:420: value trim is not a member of > Iterator[String] > > [error] Error occurred in an application involving default arguments. > > [error] val keyval=dRDD.mapPartitionsWithIndex( (ind,x) => > > process(ind,x.trim().split(' ').map(_.toDouble),q,m,r)) > > [error] > > ^ > > [error] SimpleApp.scala:425: value mkString is not a member of > > Array[Nothing] > > [error] println(res.mkString("")) > > [error] ^ > > [error] /SimpleApp.scala:427: value _1 is not a member of Nothing > > [error] var final= res(0)._1 > > [error] ^ > > [error] /home/madhura/DTWspark/src/main/scala/SimpleApp.scala:428: value > _2 > > is not a member of Nothing > > [error] var final1 = res(0)._2 - m +1 > > [error] ^ > > > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitionsWithIndex-tp9590.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitionsWithIndex-tp9590p9591.html > To unsubscribe from mapPartitionsWithIndex, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=9590&code=ZGFzLm1hZGh1cmE5NEBnbWFpbC5jb218OTU5MHwtMTcyNjUwNDQ1Mg==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitionsWithIndex-tp9590p9598.html Sent from the Apache Spark User List mailing list archive at Nabble.com.