I'm trying to log Tasks to understand physical plan and to visualize which RDD's which partition is currently computed from which creation site along with other information. I want to charge the TaskRunner to do this before actually invoking runTask() on Task and again just before giving the Task to the GC when metrics are collected. Along with the information I wish to log, I want to report, log the resources the Executor allocates to run its Tasks.
Zvara Zoltán mail, hangout, skype: zoltan.zv...@gmail.com mobile, viber: +36203129543 bank: 10918001-00000021-50480008 address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a elte: HSKSJZ (ZVZOAAI.ELTE) 2015-03-24 16:42 GMT+01:00 Sandy Ryza <sandy.r...@cloudera.com>: > That's correct. What's the reason this information is needed? > > -Sandy > > On Tue, Mar 24, 2015 at 11:41 AM, Zoltán Zvara <zoltan.zv...@gmail.com> > wrote: > >> Thank you for your response! >> >> I guess the (Spark)AM, who gives the container leash to the NM (along >> with the executor JAR and command to run) must know how many CPU or RAM >> that container capped, isolated at. There must be a resource vector along >> the encrypted container leash if I'm right that describes this. Or maybe is >> there a way for the ExecutorBackend to fetch this information directly from >> the environment? Then, the ExecutorBackend would be able to hand over this >> information to the actual Executor who creates the TaskRunner. >> >> Zvara Zoltán >> >> >> >> mail, hangout, skype: zoltan.zv...@gmail.com >> >> mobile, viber: +36203129543 >> >> bank: 10918001-00000021-50480008 >> >> address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a >> >> elte: HSKSJZ (ZVZOAAI.ELTE) >> >> 2015-03-24 16:30 GMT+01:00 Sandy Ryza <sandy.r...@cloudera.com>: >> >>> Hi Zoltan, >>> >>> If running on YARN, the YARN NodeManager starts executors. I don't >>> think there's a 100% precise way for the Spark executor way to know how >>> many resources are allotted to it. It can come close by looking at the >>> Spark configuration options used to request it (spark.executor.memory and >>> spark.yarn.executor.memoryOverhead), but it can't necessarily for the >>> amount that YARN has rounded up if those configuration properties >>> (yarn.scheduler.minimum-allocation-mb and >>> yarn.scheduler.increment-allocation-mb) are not present on the node. >>> >>> -Sandy >>> >>> -Sandy >>> >>> On Mon, Mar 23, 2015 at 5:08 PM, Zoltán Zvara <zoltan.zv...@gmail.com> >>> wrote: >>> >>>> Let's say I'm an Executor instance in a Spark system. Who started me and >>>> where, when I run on a worker node supervised by (a) Mesos, (b) YARN? I >>>> suppose I'm the only one Executor on a worker node for a given framework >>>> scheduler (driver). If I'm an Executor instance, who is the closest >>>> object >>>> to me who can tell me how many resources do I have on (a) Mesos, (b) >>>> YARN? >>>> >>>> Thank you for your kind input! >>>> >>>> Zvara Zoltán >>>> >>>> >>>> >>>> mail, hangout, skype: zoltan.zv...@gmail.com >>>> >>>> mobile, viber: +36203129543 >>>> >>>> bank: 10918001-00000021-50480008 >>>> >>>> address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a >>>> >>>> elte: HSKSJZ (ZVZOAAI.ELTE) >>>> >>> >>> >> >