You can write to disk and have Spark read it as a stream. This is how Hadoop files are iterated in Spark.
On Mon, Sep 22, 2014 at 9:22 AM, Steve Lewis <lordjoe2...@gmail.com> wrote: > The only way I find is to turn it into a list - in effect holding > everything in memory (see code below). Surely Spark has a better way. > > Also what about unterminated iterables like a Fibonacci series - (useful > only if limited in some other way ) > > > /** > * make an RDD from an iterable > * @param inp input iterator > * @param ctx context > * @param <T> type > * @return rdd from inerator as a list > */ > public static @Nonnull <T> JavaRDD<T> fromIterable(@Nonnull final > Iterable<T> inp,@Nonnull final JavaSparkContext ctx) { > List<T> holder = new ArrayList<T>(); > for (T k : inp) { > holder.add(k); > } > return ctx.parallelize(holder); > } > >