Hi Folks, I am new to spark -and this is probably a basic question.
I have a file on the hdfs 1, one 1, uno 2, two 2, dos I want to create a multi Map RDD RDD[Map[String,List[String]]] {"1"->["one","uno"], "2"->["two","dos"]} First I read the file val identityData:RDD[String] = sc.textFile($path_to_the_file, 2).cache() val identityDataList:RDD[List[String]]= identityData.map{ line => val splits= line.split(",") splits.toList } Then I group them by the first element val grouped:RDD[(String,Iterable[List[String]])]= songArtistDataList.groupBy{ element =>{ element(0) } } Then I do the equivalent of mapValues of scala collections to get rid of the first element val groupedWithValues:RDD[(String,List[String])] = grouped.flatMap[(String,List[String])]{ case (key,list)=>{ List((key,list.map{element => { element(1) }}.toList)) } } for this to actually materialize I do collect val groupedAndCollected=groupedWithValues.collect() I get an Array[String,List[String]]. I am trying to figure out if there is a way for me to get Map[String,List[String]] (a multimap), or to create an RDD[Map[String,List[String]] ] I am sure there is something simpler, I would appreciate advice. Many thanks, Amit