We use Scoobi + MR to perform joins and we particularly use blockJoin() API
of scoobi
/** Perform an equijoin with another distributed list where this list is
considerably smaller
* than the right (but too large to fit in memory), and where the keys of
right may be
* particularly skewed. */
def blockJoin[B : WireFormat](right: DList[(K, B)]): DList[(K, (A, B))] =
Relational.blockJoin(left, right)
I am trying to do a POC and what Spark join API(s) is recommended to
achieve something similar ?
Please suggest.
--
Deepak