Re: Is there any way (in Java) to make a JavaRDD from an iterable

2014-09-22 Thread Sean Owen
I imagine it is because parallelize() inherently only makes sense for smallish data to begin with, since it will have to be broadcast from the driver. Large enough data should probably live in distributed storage to begin with. The Scala equivalent wants a Seq, so I assume there is some need or va

Re: Is there any way (in Java) to make a JavaRDD from an iterable

2014-09-22 Thread Steve Lewis
is there a way to write as a temporary file? Also what about a Stream - something like an RSS feed On Mon, Sep 22, 2014 at 10:21 AM, Victor Tso-Guillen wrote: > You can write to disk and have Spark read it as a stream. This is how > Hadoop files are iterated in Spark. > > On Mon, Sep 22, 2014 at

Re: Is there any way (in Java) to make a JavaRDD from an iterable

2014-09-22 Thread Victor Tso-Guillen
You can write to disk and have Spark read it as a stream. This is how Hadoop files are iterated in Spark. On Mon, Sep 22, 2014 at 9:22 AM, Steve Lewis wrote: >The only way I find is to turn it into a list - in effect holding > everything in memory (see code below). Surely Spark has a better

Is there any way (in Java) to make a JavaRDD from an iterable

2014-09-22 Thread Steve Lewis
The only way I find is to turn it into a list - in effect holding everything in memory (see code below). Surely Spark has a better way. Also what about unterminated iterables like a Fibonacci series - (useful only if limited in some other way ) /** * make an RDD from an iterable * @