Re: How to share a Map among RDDS?

Dan Dong Wed, 22 Jul 2015 12:22:31 -0700

Hi, Andrew,
  If I broadcast the Map:
val map2=sc.broadcast(map1)

I will get compilation error:
org.apache.spark.broadcast.Broadcast[scala.collection.immutable.Map[Int,String]]
does not take parameters
[error]      val matchs= Vecs.map(term=>term.map{case (a,b)=>(map2(a),b)})


Seems it's still an RDD, so how to access it by value=map2(key) ? Thanks!

Cheers,
Dan



2015-07-22 2:20 GMT-05:00 Andrew Or <and...@databricks.com>:

> Hi Dan,
>
> If the map is small enough, you can just broadcast it, can't you? It
> doesn't have to be an RDD. Here's an example of broadcasting an array and
> using it on the executors:
> https://github.com/apache/spark/blob/c03299a18b4e076cabb4b7833a1e7632c5c0dabe/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala
> .
>
> -Andrew
>
> 2015-07-21 19:56 GMT-07:00 ayan guha <guha.a...@gmail.com>:
>
>> Either you have to do rdd.collect and then broadcast or you can do a join
>> On 22 Jul 2015 07:54, "Dan Dong" <dongda...@gmail.com> wrote:
>>
>>> Hi, All,
>>>
>>>
>>> I am trying to access a Map from RDDs that are on different compute
>>> nodes, but without success. The Map is like:
>>>
>>> val map1 = Map("aa"->1,"bb"->2,"cc"->3,...)
>>>
>>> All RDDs will have to check against it to see if the key is in the Map
>>> or not, so seems I have to make the Map itself global, the problem is that
>>> if the Map is stored as RDDs and spread across the different nodes, each
>>> node will only see a piece of the Map and the info will not be complete to
>>> check against the Map( an then replace the key with the corresponding
>>> value) E,g:
>>>
>>> val matchs= Vecs.map(term=>term.map{case (a,b)=>(map1(a),b)})
>>>
>>> But if the Map is not an RDD, how to share it like sc.broadcast(map1)
>>>
>>> Any idea about this? Thanks!
>>>
>>>
>>> Cheers,
>>> Dan
>>>
>>>
>

Re: How to share a Map among RDDS?

Reply via email to