cf-natali opened a new pull request, #426:
URL: https://github.com/apache/mesos/pull/426

   `StorageLocalResourceProviderProcess::connected` can crash on a check
   that the current state is `DISCONNECTED` if the current state is
   `READY`, which can happen if the periodic reconciliation runs after
   disconnection.
   
   It can be reproduced by running
   `ContentType/AgentResourceProviderConfigApiTest.Add/0` in a loop,
   preferably with some CPU-intensive workload in the background to affect
   the timing.
   
   Update the check to allow `READY` as well.
   
   ```
   3: I0408 09:31:11.591161 19179 http_connection.hpp:338] Ignoring 
disconnection attempt from stale connection
   3: I0408 09:31:11.591224 19179 http_connection.hpp:338] Ignoring 
disconnection attempt from stale connection
   3: I0408 09:31:11.591305 19179 http_connection.hpp:227] New endpoint 
detected at http://172.17.0.3:45793/slave(1162)/api/v1/resource_provider
   3: I0408 09:31:11.593901 19174 http_connection.hpp:283] Connected with the 
remote endpoint at http://172.17.0.3:45793/slave(1162)/api/v1/resource_provider
   3: I0408 09:31:11.593940 19190 provider.cpp:488] Disconnected from resource 
provider manager
   3: I0408 09:31:11.594046 19190 provider.cpp:749] Resource provider 
5a147f6c-6be9-4c43-9a88-31528644efb9 is in READY state
   3: I0408 09:31:11.594060 19189 status_update_manager_process.hpp:379] 
Pausing operation status update manager
   3: I0408 09:31:11.594211 19189 status_update_manager_process.hpp:385] 
Resuming operation status update manager
   3: F0408 09:31:11.594637 19190 provider.cpp:474] Check failed: DISCONNECTED 
== state (1 vs. 4) 
   3: *** Check failure stack trace: ***
   3: I0408 09:31:11.636463 19191 hierarchical.cpp:1953] Performed allocation 
for 1 agents in 208808ns
   3:     @     0x7fb58f06191d  google::LogMessage::Fail()
   3:     @     0x7fb58f060ca7  google::LogMessage::SendToLog()
   3:     @     0x7fb58f0615e2  google::LogMessage::Flush()
   3:     @     0x7fb58f0650a8  google::LogMessageFatal::~LogMessageFatal()
   3: I0408 09:31:11.727972 19184 containerizer.cpp:3252] Container 
org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-local_17120eece4184cbe8473563ff2fdafa6--CONTROLLER_SERVICE-NODE_SERVICE
 has exited
   3: I0408 09:31:11.729707 19186 provisioner.cpp:652] Ignoring destroy request 
for unknown container 
org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-local_17120eece4184cbe8473563ff2fdafa6--CONTROLLER_SERVICE-NODE_SERVICE
   3: I0408 09:31:11.732404 19188 container_daemon.cpp:189] Invoking post-stop 
hook for container 
'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-local_17120eece4184cbe8473563ff2fdafa6--CONTROLLER_SERVICE-NODE_SERVICE'
   3: I0408 09:31:11.732600 19177 service_manager.cpp:815] Disconnected from 
endpoint 'unix:///tmp/mesos-csi-NLwX0Z/endpoint.sock' of CSI plugin container 
org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-local_17120eece4184cbe8473563ff2fdafa6--CONTROLLER_SERVICE-NODE_SERVICE
   3: I0408 09:31:11.732837 19176 container_daemon.cpp:121] Launching container 
'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-local_17120eece4184cbe8473563ff2fdafa6--CONTROLLER_SERVICE-NODE_SERVICE'
   3: I0408 09:31:11.735456 19197 process.cpp:2781] Returning '404 Not Found' 
for '/slave(1162)/api/v1'
   3: E0408 09:31:11.736846 19194 container_daemon.cpp:150] Failed to launch 
container 
'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-local_17120eece4184cbe8473563ff2fdafa6--CONTROLLER_SERVICE-NODE_SERVICE':
 Failed to launch container 
'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-local_17120eece4184cbe8473563ff2fdafa6--CONTROLLER_SERVICE-NODE_SERVICE':
 Unexpected response '404 Not Found' (404 Not Found.)
   3: E0408 09:31:11.737042 19186 service_manager.cpp:843] Container daemon for 
'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-local_17120eece4184cbe8473563ff2fdafa6--CONTROLLER_SERVICE-NODE_SERVICE'
 failed: Failed to launch container 
'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-local_17120eece4184cbe8473563ff2fdafa6--CONTROLLER_SERVICE-NODE_SERVICE':
 Unexpected response '404 Not Found' (404 Not Found.)
   3:     @     0x7fb599d7c896  
mesos::internal::StorageLocalResourceProviderProcess::connected()
   3:     @     0x7fb599df16de  
_ZZN7process8dispatchIN5mesos8internal35StorageLocalResourceProviderProcessEEEvRKNS_3PIDIT_EEMS5_FvvEENKUlPNS_11ProcessBaseEE_clESC_
   3:     @     0x7fb599df15a2  
_ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal35StorageLocalResourceProviderProcessEEEvRKNS1_3PIDIT_EEMS7_FvvEEUlPNS1_11ProcessBaseEE_JSE_EEEDTclclsr3stdE7forwardIS7_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOS7_DpOSG_
   3:     @     0x7fb599df1566  
_ZN6lambda8internal6InvokeIvEclIZN7process8dispatchIN5mesos8internal35StorageLocalResourceProviderProcessEEEvRKNS4_3PIDIT_EEMSA_FvvEEUlPNS4_11ProcessBaseEE_JSH_EEEvOSA_DpOT0_
   3:     @     0x7fb599df150a  
_ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnIZNS1_8dispatchIN5mesos8internal35StorageLocalResourceProviderProcessEEEvRKNS1_3PIDIT_EEMSC_FvvEEUlS3_E_EclEOS3_
   3:     @     0x7fb590239b3b  
_ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_
   3:     @     0x7fb5901fb119  process::ProcessBase::consume()
   3:     @     0x7fb5902997f9  
_ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE
   3:     @          0x120b9e4  process::ProcessBase::serve()
   3:     @     0x7fb5901f7c5f  process::ProcessManager::resume()
   3:     @     0x7fb59021fcdb  
process::ProcessManager::init_threads()::$_15::operator()()
   3:     @     0x7fb59021fb85  
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE4$_15vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
   3:     @     0x7fb59021fb55  std::_Bind_simple<>::operator()()
   3:     @     0x7fb59021fa49  std::thread::_Impl<>::_M_run()
   3:     @     0x7fb58965fc80  (unknown)
   3:     @     0x7fb58ebc86ba  start_thread
   3:     @     0x7fb588dc541d  clone
   3:     @              (nil)  (unknown)
   ```
   
   Originally seen in Jenkins: 
https://builds.apache.org/job/Mesos/job/Mesos-Buildbot/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1%20MESOS_TEST_AWAIT_TIMEOUT=60secs,OS=ubuntu%3A16.04,label_exp=ubuntu/140/console


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to