Re: Create hashmap using two RDD's

Sri Sat, 10 Oct 2015 15:09:01 -0700

Thanks Richard , will give a try tomorrow...

Thanks
Sri


Sent from my iPhone

> On 10 Oct 2015, at 19:15, Richard Eggert <richard.egg...@gmail.com> wrote:
> 
> You should be able to achieve what you're looking for by using foldByKey to 
> find the latest record for each key. If you're relying on the order elements 
> within the file to determine which ones are the "latest" (rather than sorting 
> by some field within the file itself), call zipWithIndex first to give each 
> element a numeric index that you can use for comparisons.
> 
> For example (type annotations are unnecessary but included for clarity):
> val parsedRecords : RDD[(Key, Value)] = ???
> val indexedRecords : RDD[(Key, (Value, Int))] = parsedRecords.zipWithIndex 
> map {case ((k, v), n) => k -> (v,n)}
> val latestRecords : RDD[(Key, Value)] = indexedRecords foldByKey(null) {(a, 
> b) => (a,b) match {
>    case (null, _) => b
>    case ((_, an), (_, bn)) if an < bn => b
>    case _ => a
> } mapValues {case (v, _) => v}
> 
> You can then write "latestRecords" out to a file however you like. Note that 
> I would recommend using string interpolation or the CSV output format (for 
> DataFrames) over that string replacement you are currently using to format 
> your output.
> 
>> On Sat, Oct 10, 2015 at 1:11 PM, Kali <kali.tumm...@gmail.com> wrote:
>> Hi Richard,
>> 
>> Requirement is to get latest records using a key i think hash map is a good 
>> choice for this task.
>> As of now data comes from third party and we are not sure what's the latest 
>> record is so hash map is chosen.
>> Is there anything better than hash map please let me know.
>> 
>> Thanks
>> Sri
>> 
>> Sent from my iPad
>> 
>>> On 10 Oct 2015, at 17:10, Richard Eggert <richard.egg...@gmail.com> wrote:
>>> 
>>> Do you need the HashMap for anything else besides writing out to a file? If 
>>> not, there is really no need to create one at all.  You could just keep 
>>> everything as RDDs.
>>> 
>>>> On Oct 10, 2015 11:31 AM, "kali.tumm...@gmail.com" 
>>>> <kali.tumm...@gmail.com> wrote:
>>>> Got it ..., created hashmap and saved it to file please follow below steps 
>>>> ..
>>>> 
>>>>  val QuoteRDD=quotefile.map(x => x.split("\\|")).
>>>>       filter(line => line(0).contains("1017")).
>>>>       map(x => ((x(5)+x(4)) , (x(5),x(4),x(1) ,
>>>>       if (x(15) =="B")
>>>>         (
>>>>           {if (x(25) == "") x(9)  else x(25)},
>>>>          {if (x(37) == "") x(11) else x(37)}
>>>>         )
>>>>       else if (x(15) =="C" )
>>>>         (
>>>>            {if (x(24) == "") (x(9))  else x(24)},
>>>>            {if (x(30) == "") (x(11)) else x(30)}
>>>>         )
>>>>       else if (x(15) =="A")
>>>>          {(x(9),x(11))}
>>>>         )))
>>>> 
>>>> 
>>>>     val QuoteHashMap=QuoteRDD.collect().toMap
>>>>     val test=QuoteHashMap.values.toSeq
>>>>     val test2=sc.parallelize(test.map(x =>
>>>> x.toString.replace("(","").replace(")","")))
>>>>     test2.saveAsTextFile("C:\\Users\\kalit_000\\Desktop\\mkdata\\test.txt")
>>>>     test2.collect().foreach(println)
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context: 
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Create-hashmap-using-two-RDD-s-tp24996p25014.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 
> 
> -- 
> Rich

Re: Create hashmap using two RDD's

Reply via email to