> On 九月 23, 2016, 2:40 a.m., Guangya Liu wrote:
> > It is really weired that the performance of
> > `SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7`
> > does not improve much when calling `addSlave`, need check more for why
> > `addSlave` was same? Without fix, the `addSlave` will call `allocate` for
> > each agent, but with the fix, only one `allocate` will be called....
> >
> > ```
> > without fix:
> > [==========] Running 1 test from 1 test case.
> > [----------] Global test environment set-up.
> > [----------] 1 test from
> > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
> > [ RUN ]
> > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
> > Using 1000 agents and 6000 frameworks
> > Added 6000 frameworks in 122268us
> > Added 1000 agents in 42.037104secs
> >
> > With fix:
> > [==========] Running 1 test from 1 test case.
> > [----------] Global test environment set-up.
> > [----------] 1 test from
> > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
> > [ RUN ]
> > SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
> > Using 1000 agents and 6000 frameworks
> > Added 6000 frameworks in 116107us
> > Added 1000 agents in 41.615396secs
> > ```
>
> Guangya Liu wrote:
> Jacob, I did more test with the code on Aug 23, at which I posted some
> result in this RR, and found that the test result is different, I did
> following to get Aug 23 code.
>
> ```
> LiuGuangyas-MacBook-Pro:build gyliu$ git checkout
> 2f78a440ef4201c5b11fb92c225694e84a60369c
>
> LiuGuangyas-MacBook-Pro:build gyliu$ git log -1
> commit 2f78a440ef4201c5b11fb92c225694e84a60369c
> Author: Gilbert Song <[email protected]>
> Date: Mon Aug 22 13:00:58 2016 -0700
>
> Fixed potential flakiness in ROOT_RecoverOrphanedPersistentVolume.
>
> Review: https://reviews.apache.org/r/51271/
> ```
>
> The test result seems still same as now (without your patch and the code
> is get from Aug 23):
>
> ```
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
> [ RUN ]
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
> Using 1000 agents and 6000 frameworks
> Added 6000 frameworks in 144272us
> Added 1000 agents in 43.107001secs
> ```
>
> But anyway, I think that we need find out why the performance for
> `addSlave` was not improved based on your patch.
>
> Jacob Janco wrote:
> Yes agreed, per our Slack discussions, I'll look into this. Thanks for
> posting the followup.
>
> Benjamin Mahler wrote:
> `addSlave()` is asynchronous and we do not wait for all of the
> `addSlave()` futures to complete, so any speedup in `addSlave()` will only
> affect the next caller that waits for a result from the allocator.
>
> Benjamin Mahler wrote:
> Ah I missed that we do a `Clock::settle()`, nevermind :)
>
> Guangya Liu wrote:
> Some thinking for why `addSlave` does not improve much...
>
> Without Jacob's patch, the logic woule be:
>
> ```
> addSlave -> allocate the single slave
> addSlave -> allocate the single slave
> addSlave -> allocate the single slave
> ...
> addSlave -> allocate the single slave
> ```
>
> With Jacob's patch, the logic would be:
>
> ```
> addSlave
> addSlave
> addSlave
> ...
> addSlave - > allocate for **all** of the slaves
> ```
>
> The time elapsed by `allocate a single slave N times` with `allocate N
> slaves in one allocate` request should not different much, the only
> difference is one is looping the event queue while another is looping in
> allocator, that's why there are not enough performance change for this.
>
> But this will impact a lot when adding frameworks or some other events in
> allocator which will call `allocate(slaves)`, one proposal is we may need to
> add some new benchmark test cases which do the following logic, the following
> logic will trigger each `addframework` operation call `allocate(slaves)`
> without Jacob's patch, but will only call `allocate(slaves)` one time with
> Jacob's patch.
>
> ```
> 1) Add slaves first
> 2) Add frameworks
> ```
>
> We may get some performance improvement with above case.
>
> Currently, all of the benchmark test are using
>
> ```
> 1) Add frameworks
> 2) Add agents
> ```
>
> That's why not much performance improvement...
>
> Jacob Janco wrote:
> This makes sense Guangya, I'm in the process of creating a minimal
> benchmark adding a set of slaves then adding frameworks. I'll post here if
> the results are interesting.
Jacob, just FYI, your fix does help a lot for the case I listed above!
A new simple benchmark test.
```
TEST_P(HierarchicalAllocator_BENCHMARK_Test, AddSlaveAndFrameworks)
{
size_t slaveCount = std::tr1::get<0>(GetParam());
size_t frameworkCount = std::tr1::get<1>(GetParam());
// Pause the clock because we want to manually drive the allocations.
Clock::pause();
struct OfferedResources {
FrameworkID frameworkId;
SlaveID slaveId;
Resources resources;
};
vector<OfferedResources> offers;
auto offerCallback = [&offers](
const FrameworkID& frameworkId,
const hashmap<SlaveID, Resources>& resources_)
{
foreach (auto resources, resources_) {
offers.push_back(
OfferedResources{frameworkId, resources.first, resources.second});
}
};
cout << "Using " << slaveCount << " agents and "
<< frameworkCount << " frameworks" << endl;
vector<SlaveInfo> slaves;
slaves.reserve(slaveCount);
vector<FrameworkInfo> frameworks;
frameworks.reserve(frameworkCount);
initialize(master::Flags(), offerCallback);
Stopwatch watch;
const Resources agentResources = Resources::parse(
"cpus:24;mem:4096;disk:4096;ports:[31000-32000]").get();
// Each agent has a portion of it's resources allocated to a single
// framework. We round-robin through the frameworks when allocating.
Resources allocation = Resources::parse("cpus:16;mem:2014;disk:1024").get();
Try<::mesos::Value::Ranges> ranges = fragment(createRange(31000, 32000), 16);
ASSERT_SOME(ranges);
ASSERT_EQ(16, ranges->range_size());
allocation += createPorts(ranges.get());
watch.start();
for (size_t i = 0; i < slaveCount; i++) {
slaves.push_back(createSlaveInfo(agentResources));
// Add some used resources on each slave. Let's say there are 16 tasks, each
// is allocated 1 cpu and a random port from the port range.
hashmap<FrameworkID, Resources> used;
used[frameworks[i % frameworkCount].id()] = allocation;
allocator->addSlave(
slaves[i].id(), slaves[i], None(), slaves[i].resources(), used);
}
// Wait for all the `addSlave` operations to be processed.
Clock::settle();
watch.stop();
cout << "Added " << slaveCount << " agents in "
<< watch.elapsed() << endl;
watch.start();
for (size_t i = 0; i < frameworkCount; i++) {
frameworks.push_back(createFrameworkInfo("*"));
allocator->addFramework(frameworks[i].id(), frameworks[i], {});
}
// Wait for all the `addFramework` operations to be processed.
Clock::settle();
watch.stop();
cout << "Added " << frameworkCount << " frameworks in "
<< watch.elapsed() << endl;
}
```
With your fix:
```
./bin/mesos-tests.sh --benchmark
--gtest_filter="SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddSlaveAndFrameworks/7"
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
[ RUN ]
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddSlaveAndFrameworks/7
Using 1000 agents and 6000 frameworks
Added 1000 agents in 141352us
Added 6000 frameworks in 12.710913secs
[ OK ]
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddSlaveAndFrameworks/7
(12902 ms)
[----------] 1 test from
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test (12902 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (12912 ms total)
[ PASSED ] 1 test.
```
Without your fix, I did not wait for the test run finished, as I waited for
almost 10 minutes and the test is not finished....
- Guangya
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/#review150123
-----------------------------------------------------------
On 九月 23, 2016, 4:32 p.m., Jacob Janco wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51027/
> -----------------------------------------------------------
>
> (Updated 九月 23, 2016, 4:32 p.m.)
>
>
> Review request for mesos, Benjamin Mahler, Guangya Liu, James Peach, Klaus
> Ma, and Jiang Yan Xu.
>
>
> Bugs: MESOS-3157
> https://issues.apache.org/jira/browse/MESOS-3157
>
>
> Repository: mesos
>
>
> Description
> -------
>
> - Triggered allocations dispatch allocate() only
> if there is no pending allocation in the queue.
> - Allocation candidates are accumulated and only
> cleared when enqueued allocations are processed.
>
>
> Diffs
> -----
>
> src/master/allocator/mesos/hierarchical.hpp
> 2c31471ee0f5d6836393bf87ff9ecfd8df835013
> src/master/allocator/mesos/hierarchical.cpp
> 2d56bd011f2c87c67a02d0ae467a4a537d36867e
>
> Diff: https://reviews.apache.org/r/51027/diff/
>
>
> Testing
> -------
>
> make check
>
> note: check without filters depends on https://reviews.apache.org/r/51028
>
> With new benchmark https://reviews.apache.org/r/49617:
> Sample output without 51027:
> [ RUN ]
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 10000 agents and 3000 frameworks
> Added 3000 frameworks in 57251us
> Added 10000 agents in 3.21345353333333mins
> allocator settled after 1.61236038333333mins
> [ OK ]
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> (290578 ms)
>
> Sample output with 51027:
> [ RUN ]
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 10000 agents and 3000 frameworks
> Added 3000 frameworks in 39817us
> Added 10000 agents in 3.22860541666667mins
> allocator settled after 25.525654secs
> [ OK ]
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> (220137 ms)
>
>
> Thanks,
>
> Jacob Janco
>
>