If it is a small collection of them on the driver, you can just use
sc.parallelize to create an RDD.
On Tue, Jan 13, 2015 at 7:56 AM, Malith Dhanushka
wrote:
> Hi Reynold,
>
> Thanks for the response. I am just wondering, lets say we have set of Row
> objects. Isn't there a straightforward way
Depends on what the other side is doing. You can create your own RDD
implementation by subclassing RDD, or it might work if you use
sc.parallelize(1 to n, n).mapPartitionsWithIndex( /* code to read the data
and return an iterator */ ) where n is the number of partitions.
On Tue, Jan 13, 2015 at 12
Hi,
We have a custom datasources API, which connects to various data sources
and exposes them out as a common API. We are now trying to implement the
Spark datasources API released in 1.2.0 to connect Spark for analytics.
Looking at the sources API, we figured out that we should extend a scan
cla