Hi Dan,

If the map is small enough, you can just broadcast it, can't you? It
doesn't have to be an RDD. Here's an example of broadcasting an array and
using it on the executors:
https://github.com/apache/spark/blob/c03299a18b4e076cabb4b7833a1e7632c5c0dabe/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala
.

-Andrew

2015-07-21 19:56 GMT-07:00 ayan guha <guha.a...@gmail.com>:

> Either you have to do rdd.collect and then broadcast or you can do a join
> On 22 Jul 2015 07:54, "Dan Dong" <dongda...@gmail.com> wrote:
>
>> Hi, All,
>>
>>
>> I am trying to access a Map from RDDs that are on different compute
>> nodes, but without success. The Map is like:
>>
>> val map1 = Map("aa"->1,"bb"->2,"cc"->3,...)
>>
>> All RDDs will have to check against it to see if the key is in the Map or
>> not, so seems I have to make the Map itself global, the problem is that if
>> the Map is stored as RDDs and spread across the different nodes, each node
>> will only see a piece of the Map and the info will not be complete to check
>> against the Map( an then replace the key with the corresponding value) E,g:
>>
>> val matchs= Vecs.map(term=>term.map{case (a,b)=>(map1(a),b)})
>>
>> But if the Map is not an RDD, how to share it like sc.broadcast(map1)
>>
>> Any idea about this? Thanks!
>>
>>
>> Cheers,
>> Dan
>>
>>

Reply via email to