RDD with a Map

Amit Kumar Tue, 03 Jun 2014 14:57:06 -0700

Hi Folks,

I am new to spark -and this is probably a basic question.


I have a file on the hdfs

1, one
1, uno
2, two
2, dos

I want to create a multi Map RDD  RDD[Map[String,List[String]]]

{"1"->["one","uno"], "2"->["two","dos"]}


First I read the file
val identityData:RDD[String] = sc.textFile($path_to_the_file, 2).cache()

val identityDataList:RDD[List[String]]=
      identityData.map{ line =>
        val splits= line.split(",")
        splits.toList
    }

Then I group them by the first element

 val grouped:RDD[(String,Iterable[List[String]])]=
    songArtistDataList.groupBy{
      element =>{
        element(0)
      }
    }

Then I do the equivalent of mapValues of scala collections to get rid of
the first element

 val groupedWithValues:RDD[(String,List[String])] =
    grouped.flatMap[(String,List[String])]{ case (key,list)=>{
      List((key,list.map{element => {
        element(1)
      }}.toList))
    }
    }

for this to actually materialize I do collect

 val groupedAndCollected=groupedWithValues.collect()

I get an Array[String,List[String]].

I am trying to figure out if there is a way for me to get
Map[String,List[String]] (a multimap), or to create an
RDD[Map[String,List[String]] ]


I am sure there is something simpler, I would appreciate advice.

Many thanks,
Amit

RDD with a Map

Reply via email to