Ah, I completely read over the "250GB" part. Yeah you have a huge heap then and indeed you can run into problems with GC pauses. You can probably still manage such huge executors with a fair bit of care with the GC and memory settings, and, you have a good reason to consider this. In particular I imagine you will want a quite large old generation on the heap, and focus on settings that optimize for low pause rather than throughput.
If these nodes are entirely dedicated to one app, yes ideally you let 1 executor take all usable memory on each, if you can do so without GC becoming an issue. Indeed fitting the driver becomes an issue because the memory size must be the same for all executors. You could run the drivers on another smaller machine? Since your code will be using lots of heap unknown to Spark, then you will want to make sure you tell Spark to limit cache / shuffle memory more than usual, so that it doesn't run you out of memory. On Wed, Jun 29, 2016 at 5:40 PM, Aaron Perrin <aper...@timerazor.com> wrote: > From what I've read, people had seen performance issues when the JVM used > more than 60 GiB of memory. I haven't tested it myself, but I guess not > true? > > Also, how does one optimize memory when the driver allocates some on one > node? > > For example, let's say my cluster has N nodes each with 500 GiB of memory. > And, let's say roughly, the amount of memory available per executors is > ~80%, or ~400 GiB. So, you're suggesting I should allocate ~400 GiB of mem > to the executor? How does that affect the node that's hosting the driver, > when the driver uses ~10-15 GiB? Or, do I have to decrease executor memory > to ~385 across all executors? > > (Note: I'm running on Yarn, which may affect this.) > > Thanks, > > Aaron > > > On Wed, Jun 29, 2016 at 12:09 PM Sean Owen <so...@cloudera.com> wrote: >> >> If you have one executor per machine, which is the right default thing >> to do, and this is a singleton in the JVM, then this does just have >> one copy per machine. Of course an executor is tied to an app, so if >> you mean to hold this data across executors that won't help. >> >> >> On Wed, Jun 29, 2016 at 3:00 PM, Aaron Perrin <aper...@timerazor.com> >> wrote: >> > The user guide describes a broadcast as a way to move a large dataset to >> > each node: >> > >> > "Broadcast variables allow the programmer to keep a read-only variable >> > cached on each machine rather than shipping a copy of it with tasks. >> > They >> > can be used, for example, to give every node a copy of a large input >> > dataset >> > in an efficient manner." >> > >> > And the broadcast example shows it being used with a variable. >> > >> > But, is it somehow possible to instead broadcast a function that can be >> > executed once, per node? >> > >> > My use case is the following: >> > >> > I have a large data structure that I currently create on each executor. >> > The >> > way that I create it is a hack. That is, when the RDD function is >> > executed >> > on the executor, I block, load a bunch of data (~250 GiB) from an >> > external >> > data source, create the data structure as a static object in the JVM, >> > and >> > then resume execution. This works, but it ends up costing me a lot of >> > extra >> > memory (i.e. a few TiB when I have a lot of executors). >> > >> > What I'd like to do is use the broadcast mechanism to load the data >> > structure once, per node. But, I can't serialize the data structure >> > from >> > the driver. >> > >> > Any ideas? >> > >> > Thanks! >> > >> > Aaron >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org