If you are using Spark SQL and joining two dataFrames the optimizer would
automatically broadcast the smaller table (You can configure the size if
the default is too small).
Else, in code, you can collect any RDD to the driver and broadcast using
the context.broadcast method.
http://ampcamp.berkel
Kali,
This is possible depending on the access pattern by your ETL logic. If
you only read (no point mutations) and you can pay the additional price of
having to scan your dimension data each time you have to lookup something
then spark could work out. Note that a KV RDD isn't really a Map
inter