Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Jiang Yan Xu Fri, 20 Oct 2017 15:33:59 -0700


> On Oct. 19, 2017, 6:38 p.m., Benjamin Mahler wrote:
> > Thanks Yan! I will dig in soon.
> > 
> > Just some quick questions:
> > 
> > (1) I thought during the meeting you said it was taking a minute, but 
> > looking at all the benchmark timings they're all under a second? Is it only 
> > the benchmark setup that's expensive here?
> > (2) Is this with the lock free event & run queues? If not, how much do they 
> > help?
> > (3) As an aside, it has come up before, but it would be useful to be able 
> > to force the messages to go through the remote stack rather than the local 
> > stack. No need to think about this yet, but just something to keep in mind 
> > as not being accurate in this benchmark.


1) Yeah looks like it. I used to include the setup time so it was large. 
2) Yeah I have used `--enable-optimize --enable-lock-free-run-queue 
--enable-lock-free-event-queue 
--enable-last-in-first-out-fixed-size-semaphore`. I could compare with the perf 
without them.
3) Right right I think we should keep that in mind and we should have tests 
that cover the remote stack. For the case here I thought it would be a simple 
and good-enough start since the local stack alright coveres the proto 
(de)serliazation and the rest of the libprocess optimization that we recently 
have improved.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188799
-----------------------------------------------------------


On Oct. 19, 2017, 4:28 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 4:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> -------
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 
> completed tasks in 45.075488ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48126 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 14.172361ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (45979 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 413.508328ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (49487 ms)
> [----------] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 
> completed tasks in 32.787363ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48266 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 19.735003ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (46169 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 321.267267ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (51550 ms)
> [----------] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 
> completed tasks in 85.800335ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (59247 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 35.342066ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (93662 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 798.738642ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (116078 ms)
> [----------] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (268987 ms total)
> 
> ...
> 
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 
> completed tasks in 66.270249ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (59925 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 50.146349ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (88631 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 807.621964ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (109941 ms)
> [----------] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (258497 ms total)
> ```
> 
> The recently patches cut down the time by nearly 50%. These were built with 
> `--enable-optimize`.
> 
> I can also get some flame graphs.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Reply via email to