Hi Don,
It’s not so much map() vs flatMap(). You can return a collection and have
Spark flatten the result.
My point was more to change from Seq[BigDataStructure] to
Seq[SmallDataStructure]
If the use case is really storing image data - I would try to use
Seq[Vector] and store the values as a s
Hey Richard,
Good to hear from you as well. I thought I would ask if there was
something Scala specific I was missing in handling these large classes.
I can tweak my job to do a map() and then only one large object will be
created at a time and returned, which should allow me to lower my executo
Hi Don,
Good to hear from you. I think the problem is that regardless of whether
you use yield or a generator - Spark internally will produce the entire
result as a single large JVM object which will blow up your heap space.
Would it be possible to shrink the overall size of the image object stor
This sounds like something mapPartitions should be able to do, not
sure if there's an easier way.
On Thu, Dec 14, 2017 at 10:20 AM, Don Drake wrote:
> I'm looking for some advice when I have a flatMap on a Dataset that is
> creating and returning a sequence of a new case class
> (Seq[BigDataStruc