Will Berkeley created KUDU-2708:
-----------------------------------

             Summary: Possible contention creating temporary files while 
flushing cmeta during an election storm
                 Key: KUDU-2708
                 URL: https://issues.apache.org/jira/browse/KUDU-2708
             Project: Kudu
          Issue Type: Improvement
            Reporter: Will Berkeley


Doing investigation into consensus queue overflows that happen under heavy 
write load, I noticed 6/10 service threads at the time of overflow have stacks 
like

{noformat}
0x3b6720f710 <unknown>
           0x1fb900a base::internal::SpinLockDelay()
           0x1fb8ea7 base::SpinLock::SlowLock()
            0xb82e25 kudu::consensus::RaftConsensus::RequestVote()
            0x931555 kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
           0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
           0x1e2935a kudu::rpc::ServicePool::RunThread()
           0x1f9bd91 kudu::Thread::SuperviseThread()
        0x3b672079d1 start_thread
        0x3b66ee88fd clone
{noformat}

They are waiting on some tablet's Raft consensus instance's {{lock_}} in order 
to vote. Looking into what might be holding that lock, I see stacks like

{noformat}
0x3b6720f710 <unknown>
        0x3b66edb2ed __GI_open64
        0x3b66e63caa __gen_tempname
           0x1f1cf35 kudu::(anonymous namespace)::PosixEnv::MkTmpFile()
           0x1f1f662 kudu::(anonymous namespace)::PosixEnv::NewTempRWFile()
           0x1f8305e kudu::pb_util::WritePBContainerToPath()
            0xb47932 kudu::consensus::ConsensusMetadata::Flush()
            0xb74164 
kudu::consensus::RaftConsensus::SetVotedForCurrentTermUnlocked()
            0xb783aa 
kudu::consensus::RaftConsensus::RequestVoteRespondVoteGranted()
            0xb836a1 kudu::consensus::RaftConsensus::RequestVote()
            0x931555 kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
           0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
           0x1e2935a kudu::rpc::ServicePool::RunThread()
           0x1f9bd91 kudu::Thread::SuperviseThread()
        0x3b672079d1 start_thread
        0x3b66ee88fd clone
{noformat}

Doing some junior spelunking into glibc code, one hypothesis is that we are 
generating lots of collisions of proposed temporary file names in the cmeta 
folder because many threads are attempting to flush cmeta at once. The glibc 
code looks like

Maybe we could put the thread id into the temporary file name when a thread 
does a cmeta flush.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to