GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/2840

    [WIP][SPARK-3822] Executor scaling mechanism for Yarn

    This is part of a broader effort to enable dynamic scaling of executors 
([SPARK-3174](https://issues.apache.org/jira/browse/SPARK-3174)). This is 
intended to work alongside with SPARK-3795 (#2746), SPARK-3796 and SPARK-3797.
    
    The logic is built on top of @PraveenSeluka at #2798. This is different 
from the changes there in that the mechanism is implemented within the existing 
scheduler backend framework rather than in new `Actor` classes. This also 
introduces a parent abstract class `YarnSchedulerBackend` to encapsulate common 
logic to communicate with the Yarn `ApplicationMaster`.
    
    I have tested this on a stable Yarn cluster. This is still WIP because when 
an executor is removed, `SparkContext` and its components react as if it has 
failed, resulting in many scary error messages and eventual timeouts. While 
it's not strictly necessary to fix this as of the first-cut implementation of 
this mechanism, it would be good to add logic to distinguish this case if it 
doesn't require too much more work.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark yarn-scaling-mechanism

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2840
    
----
commit bbee669d414bca292380028070ebd684ddca6c88
Author: Andrew Or <[email protected]>
Date:   2014-10-16T20:33:09Z

    Add mechanism in yarn client mode
    
    This commit allows the YarnClientSchedulerBackend to communicate
    with the AM to request or kill executors dynamically. This extends
    the existing MonitorActor in ApplicationMaster to also accept
    messages to scale the number of executors up/down.
    
    TODO: extend this to the yarn cluster backend as well, and put
    the common functionality in its own class.

commit 53e81454a65c2543afdfbd602f2fec3b850872c4
Author: Andrew Or <[email protected]>
Date:   2014-10-17T01:04:08Z

    Start yarn actor early to listen for AM registration message
    
    Previously it was dropped, and the feature was never successfully
    enabled. This is an easy fix but it took a long time to understand
    what is wrong. This is Akka in its finest glory.

commit c4dfaac0c73689b79663e924e875e8ff93d8e5c6
Author: Andrew Or <[email protected]>
Date:   2014-10-17T01:45:30Z

    Avoid thrashing when removing executors
    
    Previously, we immediately add an executor back upon removing it.
    This is simply because we don't keep a relevant counter updated.
    
    An important TODO at this point is to inform the SparkContext
    sooner about the executor successfully being killed. Otherwise
    we have to wait for the BlockManager timeout, which may take a
    long time.

commit 47466cd9d0f4a4f2f060e82796b25867092f5de0
Author: Andrew Or <[email protected]>
Date:   2014-10-18T01:32:24Z

    Refactor common Yarn scheduler backend logic
    
    As of this commit the mechanism is accessible from cluster mode
    in addition to just client mode.

commit 7b76d0a1b0724f0c6572b8ffa9a13135d6d63b5f
Author: Andrew Or <[email protected]>
Date:   2014-10-18T02:04:30Z

    Expose mechanism in SparkContext as developer API

commit d987b3e9e33165a482189343c324a0babc6ff3f9
Author: Andrew Or <[email protected]>
Date:   2014-10-18T02:33:54Z

    Move addWebUIFilters to Yarn scheduler backend
    
    It's only used by Yarn.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to