Yes, RDD as a map of String keys and List of string as values. Amit
On Jun 4, 2014, at 2:46, Oleg Proudnikov <oleg.proudni...@gmail.com> wrote: > Just a thought... Are you trying to use use the RDD as a Map? > > > > On 3 June 2014 23:14, Doris Xin <doris.s....@gmail.com> wrote: > Hey Amit, > > You might want to check out PairRDDFunctions. For your use case in > particular, you can load the file as a RDD[(String, String)] and then use the > groupByKey() function in PairRDDFunctions to get an RDD[(String, > Iterable[String])]. > > Doris > > > On Tue, Jun 3, 2014 at 2:56 PM, Amit Kumar <kumarami...@gmail.com> wrote: > Hi Folks, > > I am new to spark -and this is probably a basic question. > > I have a file on the hdfs > > 1, one > 1, uno > 2, two > 2, dos > > I want to create a multi Map RDD RDD[Map[String,List[String]]] > > {"1"->["one","uno"], "2"->["two","dos"]} > > > First I read the file > val identityData:RDD[String] = sc.textFile($path_to_the_file, 2).cache() > > val identityDataList:RDD[List[String]]= > identityData.map{ line => > val splits= line.split(",") > splits.toList > } > > Then I group them by the first element > > val grouped:RDD[(String,Iterable[List[String]])]= > songArtistDataList.groupBy{ > element =>{ > element(0) > } > } > > Then I do the equivalent of mapValues of scala collections to get rid of the > first element > > val groupedWithValues:RDD[(String,List[String])] = > grouped.flatMap[(String,List[String])]{ case (key,list)=>{ > List((key,list.map{element => { > element(1) > }}.toList)) > } > } > > for this to actually materialize I do collect > > val groupedAndCollected=groupedWithValues.collect() > > I get an Array[String,List[String]]. > > I am trying to figure out if there is a way for me to get > Map[String,List[String]] (a multimap), or to create an > RDD[Map[String,List[String]] ] > > > I am sure there is something simpler, I would appreciate advice. > > Many thanks, > Amit > > > > > > > > > > > > > > -- > Kind regards, > > Oleg >