I imagine it is because parallelize() inherently only makes sense for
smallish data to begin with, since it will have to be broadcast from
the driver. Large enough data should probably live in distributed
storage to begin with.
The Scala equivalent wants a Seq, so I assume there is some need or
va
is there a way to write as a temporary file? Also what about a Stream -
something like an RSS feed
On Mon, Sep 22, 2014 at 10:21 AM, Victor Tso-Guillen
wrote:
> You can write to disk and have Spark read it as a stream. This is how
> Hadoop files are iterated in Spark.
>
> On Mon, Sep 22, 2014 at
You can write to disk and have Spark read it as a stream. This is how
Hadoop files are iterated in Spark.
On Mon, Sep 22, 2014 at 9:22 AM, Steve Lewis wrote:
>The only way I find is to turn it into a list - in effect holding
> everything in memory (see code below). Surely Spark has a better
The only way I find is to turn it into a list - in effect holding
everything in memory (see code below). Surely Spark has a better way.
Also what about unterminated iterables like a Fibonacci series - (useful
only if limited in some other way )
/**
* make an RDD from an iterable
* @