Hi, I think you have the right idea. I would not even worry about flatMap. val rdd = sc.parallelize(1 to 1000000, numSlices = 1000).map(x => generateRandomObject(x))
Then when you try to evaluate something on this RDD, it will happen partition-by-partition. So 1000 random objects will be generated at a time per executor thread. On Mon, Dec 8, 2014 at 8:05 PM, Steve Lewis <lordjoe2...@gmail.com> wrote: > I have a function which generates a Java object and I want to explore > failures which only happen when processing large numbers of these object. > the real code is reading a many gigabyte file but in the test code I can > generate similar objects programmatically. I could create a small list, > parallelize it and then use flatmap to inflate it several times by a factor > of 1000 (remember I can hold a list of 1000 items in memory but not a > million) > Are there better ideas - remember I want to create more objects than can > be held in memory at once. > >