I was able to get a couple relevant answers to this on StackOverflow: http://stackoverflow.com/questions/34339300/nesting-parallelizations-in-spark-whats-the-right-approach/34340986#34340986
http://stackoverflow.com/questions/34386086/casting-long-to-double-inside-109-for-loop-really-bad-idea?noredirect=1#comment56515911_34386086 Apparently Scala allows the use of a Range inside parallelize() and Java 8 should have something similar but I have not tested it yet. For the question about "nested for loops" in Spark ... you can't really do that directly, but I am seeing there's always a way to think Sparkily and do the same thing. For example in my case I wanted to create a 10^6 length RDD and then iterate over it 1000 times. Still testing this out but the advice I got (the answer to the first SO question above) is to build the RDD to be 1000*10^6 long and then iterate as needed. I could use a mapByPartitions() instead of a map, and have each subset of the main set be a partition. This also brought up the idea that using a List to init an RDD has the limit that your RDD can only have MAXINT elements, so if you NEED more than 2.147*10^9 RDD elements, I have it on good authority that the actual RDD size limit is just your resources. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-meet-nested-loop-on-pairRdd-tp21121p25757.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org