Re: Broadcast table

2015-10-26 Thread Jags Ramnarayanan
If you are using Spark SQL and joining two dataFrames the optimizer would automatically broadcast the smaller table (You can configure the size if the default is too small). Else, in code, you can collect any RDD to the driver and broadcast using the context.broadcast method. http://ampcamp.berkel

Re: can I use Spark as alternative for gem fire cache ?

2015-10-21 Thread Jags Ramnarayanan
Kali, This is possible depending on the access pattern by your ETL logic. If you only read (no point mutations) and you can pay the additional price of having to scan your dimension data each time you have to lookup something then spark could work out. Note that a KV RDD isn't really a Map inter