Re: Review Request 35247: Fixed race between EXPECT_CALL(resourceOffers, _) and driver.start() in fetcher_cache_tests.cpp.

Bernd Mathiske Sun, 14 Jun 2015 07:10:15 -0700

Vinod,

I am working on it. I reckon this will work:


s/Return()/DeclineOffers()

I’ll prepare a RR with this, that also fixes the CHECK the way Jie proposed.

Bernd

> On Jun 12, 2015, at 8:59 PM, Vinod Kone <[email protected]> wrote:
> 
> This is an automatically generated e-mail. To reply, 
> visit:https://reviews.apache.org/r/35247/ 
> <https://reviews.apache.org/r/35247/>
> On June 9th, 2015, 6:11 p.m. UTC, Vinod Kone wrote:
> 
> src/tests/fetcher_cache_tests.cpp 
> <https://reviews.apache.org/r/35247/diff/1/?file=981340#file981340line201> 
> (Diff revision 1)
> void FetcherCacheTest::SetUp()
> 201   
> 202   
>   // This installs a temporary reaction to resourceOffers calls, which
> 203   
>   // must be in place BEFORE starting the scheduler driver. This
> 204   
>   // "cover" is necessary, because we only add relevant mock actions
> 205   
>   // in launchTask() and launchTasks() AFTER starting the driver.
> 206   
>   EXPECT_CALL(scheduler, resourceOffers(driver, _))
> 207   
>     .WillRepeatedly(Return());
> While this looks good as a temporary fix, what is the long term strategy here?
> I really don't like setting expectations in SetUp() or TearDown() because 
> it's really hard to reason about in the individual tests. For example, why 
> did you set expecations on registered and offers but not others? I prefer to 
> move these expectations to tests. 
> Also, this SetUp() is doing too much (starting slave, starting master, 
> constructing scheduler but not starting it, setting some expectations) and 
> there is no documentation for it!
> On June 9th, 2015, 8:07 p.m. UTC, Bernd Mathiske wrote:
> 
> Long term I am working on developing up stress tests for the fetcher. These 
> are still relatively basic functionality tests so far.
> Yes, SetUp() and TearDown() do a lot here. Would you prefer a) inlining them 
> or b) commenting what they do more or c) both? In my experience a) would be 
> most consistent with existing patterns, but it makes it harder to spot what 
> the different tests are doing besides all the code that is always the same. 
> And the general code blowup would be rather substantial in this test series.
> On June 10th, 2015, 5:36 a.m. UTC, Bernd Mathiske wrote:
> 
> Maybe you meant the long term plan wrt. the code structure here. In this 
> case, it is to create more generally applicable test pattern lego blocks that 
> aggregate such things as in the Setup() here. These will have to be carefully 
> selected/crafted then.
> On June 10th, 2015, 9:01 p.m. UTC, Vinod Kone wrote:
> 
> i would prefer to inline them as we do everywhere else in the code base. The 
> current abstractions are a bit hard to follow.
> Actually, looks like this fix is wrong! If this generic expectation catches a 
> resource offer, it is possible that the expectation in launchTask() wouldn't 
> get any offer, failing the CHECK.
> 18:26:48 DEBUG: [ RUN      ] FetcherCacheTest.CachedFallback
> ...
> ...
> ...
> 18:26:49 DEBUG: I0612 18:26:49.245573 13609 hierarchical.hpp:354] Added 
> framework 20150612-182649-1787367596-40122-13591-0000
> 18:26:49 DEBUG: I0612 18:26:49.245582 13610 sched.cpp:448] Framework 
> registered with 20150612-182649-1787367596-40122-13591-0000
> 18:26:49 DEBUG: I0612 18:26:49.245728 13609 master.cpp:4108] Sending 1 offers 
> to framework 20150612-182649-1787367596-40122-13591-0000 (default) at 
> scheduler-fed7aed8-916f-48c8-8b74-3f9b2e96bccc@<redacted>
> 18:26:49 DEBUG: I0612 18:26:49.263645 13622 leveldb.cpp:343] Persisting 
> action (18 bytes) to leveldb took 20.784933ms
> 18:26:49 DEBUG: I0612 18:26:49.263702 13622 leveldb.cpp:401] Deleting ~2 keys 
> from leveldb took 25193ns
> 18:26:49 DEBUG: I0612 18:26:49.263715 13622 replica.cpp:679] Persisted action 
> at 4
> 18:26:49 DEBUG: I0612 18:26:49.263722 13622 replica.cpp:664] Replica learned 
> TRUNCATE action at position 4
> 18:27:04 DEBUG: I0612 18:27:04.234977 13607 slave.cpp:4048] Querying resource 
> estimator for oversubscribable resources
> 18:27:04 DEBUG: I0612 18:27:04.235074 13607 slave.cpp:4069] Received 
> oversubscribable resources  from the resource estimator
> 18:27:04 DEBUG: F0612 18:27:04.264626 13591 fetcher_cache_tests.cpp:361] 
> CHECK_READY(offers): is PENDING Failed to wait for resource offers
> 18:27:04 DEBUG: *** Check failure stack trace: ***
> 18:27:04 DEBUG:     @     0x7f72fb2715fd  google::LogMessage::Fail()
> 18:27:04 DEBUG:     @     0x7f72fb27343d  google::LogMessage::SendToLog()
> 18:27:04 DEBUG:     @     0x7f72fb2711ec  google::LogMessage::Flush()
> 18:27:04 DEBUG:     @     0x7f72fb273d39  
> google::LogMessageFatal::~LogMessageFatal()
> 18:27:04 DEBUG:     @           0x53dab8  _CheckFatal::~_CheckFatal()
> 18:27:04 DEBUG:     @           0x66be2f  
> mesos::internal::tests::FetcherCacheTest::launchTask()
> 18:27:04 DEBUG:     @           0x66f6c9  
> mesos::internal::tests::FetcherCacheTest_CachedFallback_Test::TestBody()
> 18:27:04 DEBUG:     @           0xbb24e3  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> 18:27:04 DEBUG:     @           0xba9787  testing::Test::Run()
> 18:27:04 DEBUG:     @           0xba982e  testing::TestInfo::Run()
> 18:27:04 DEBUG:     @           0xba9935  testing::TestCase::Run()
> 18:27:04 DEBUG:     @           0xba9bd8  
> testing::internal::UnitTestImpl::RunAllTests()
> 18:27:04 DEBUG:     @           0xba9e77  testing::UnitTest::Run()
> 18:27:04 DEBUG:     @           0x4a2073  main
> 18:27:04 DEBUG:     @     0x7f72f90a2d5d  __libc_start_main
> 18:27:04 DEBUG:     @           0x4ad3f9  (unknown)
> 
> Also, as Jie mentioned below, we should never do CHECK inside tests because 
> it will crash the whole unix process running tests. We are currently having 
> trouble getting our internal RPM builds to succeed because of this. Can you 
> please fix this ASAP?
> 
> - Vinod
> 
> 
> On June 9th, 2015, 9:32 a.m. UTC, Bernd Mathiske wrote:
> 
> Review request for mesos, Benjamin Hindman, Till Toenshoff, and Vinod Kone.
> By Bernd Mathiske.
> Updated June 9, 2015, 9:32 a.m.
> 
> Bugs: MESOS-2815 <https://issues.apache.org/jira/browse/MESOS-2815>, 
> MESOS-2829 <https://issues.apache.org/jira/browse/MESOS-2829>, MESOS-2831 
> <https://issues.apache.org/jira/browse/MESOS-2831>
> Repository: mesos
> Description
> 
> Fixed race between EXPECT_CALL(resourceOffers, _) and driver.start() in 
> fetcher_cache_tests.cpp.
> Installed a default response that provides teporary cover for this mocked 
> method until we install more interesting callbacks later. This prevents gmock 
> from complaining about an "uninteresting gmock call", which led to a variety 
> of tests failing due to offers not getting through subsequently.
> All fetcher cache tests have been affected by this race and should be fixed 
> in this regard now.
> (Also fixed some typos.)
> Testing
> 
> make check
> Diffs
> 
> src/tests/fetcher_cache_tests.cpp (cbd44b98d19953d174fac977f509d4900a37481f)
> View Diff <https://reviews.apache.org/r/35247/diff/>

Re: Review Request 35247: Fixed race between EXPECT_CALL(resourceOffers, _) and driver.start() in fetcher_cache_tests.cpp.

Reply via email to