Hi, Are you caching RDD into storage memory here?
Example s.persist(org.apache.spark.storage.StorageLevel.MEMORY_ONLY) Do you have a snapshot of your storage tab? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 31 August 2016 at 14:53, Jakub Dubovsky <spark.dubovsky.ja...@gmail.com> wrote: > Hey all, > > I have a conceptual question which I have hard time finding answer for. > > Is the jvm where spark driver is running also used to run computations > over rdd partitions and persist them? The answer is obvious for local mode > (yes). But when it runs on yarn/mesos/standalone with many executors is the > answer no? > > *My motivation is following* > In "executors" tab of sparkUI in "storage memory" column for driver table > line one can see "0.0 B / 14.2 GB" for example. This suggests that 14G of > ram are not available to computations done in driver but are reserved for > rdd caching. > > But I have plenty of memory on executors to cache rdd there. I would like > to use driver memory for being able to collect medium sized data. Since I > assume that collected data are stored out of memory reserved from cache > this means that those 14G not available for saving collected data. > > It looks like spark2.0.0 is doing this cache vs non-cache memory > management somehow automatically but I do not understand that yet > > Thanks for any insight on this > > Jakub D. >