[ https://issues.apache.org/jira/browse/KUDU-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808817#comment-17808817 ]
ASF subversion and git services commented on KUDU-3491: ------------------------------------------------------- Commit 8af89ca5e2454600888873882ce543cdb0b7b3ab in kudu's branch refs/heads/branch-1.17.x from Ádám Bakai [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=8af89ca5e ] KUDU-3491 Destruct master before creating a new one ServerBase constructor runs MinidumpExceptionHandler constructor that calls RegisterMinidumpExceptionHandler(). This function increments the static atomic variable current_num_instances_. Then the ServerBase is destructed, a similar process happens and current_num_instances_ gets decremented. If current_num_instances_ is not zero before incrementing or not 1 before decrementing, then it is considered an error. This indicates that only one Server can run at any given time. But in case of multi-master config, the master server is replaced, and without the change it is possible that the second server's constructor precede first server's destructor. This change makes it sure that the destructor is executed before the second one's constructor. Change-Id: I3c1019d092bbf9e58f4fc33753a1218bc79735d3 Reviewed-on: http://gerrit.cloudera.org:8080/20913 Reviewed-by: Attila Bukor <abu...@apache.org> Reviewed-by: Mahesh Reddy <mre...@cloudera.com> Tested-by: Kudu Jenkins (cherry picked from commit 7562277fc6f68b0dcab593d56de03bb344a95b3e) Reviewed-on: http://gerrit.cloudera.org:8080/20917 Tested-by: Alexey Serbin <ale...@apache.org> Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com> > MiniDumpExceptionHandler assert randomly fails > ----------------------------------------------- > > Key: KUDU-3491 > URL: https://issues.apache.org/jira/browse/KUDU-3491 > Project: Kudu > Issue Type: Bug > Affects Versions: 1.15.0 > Reporter: Bakai Ádám > Assignee: Bakai Ádám > Priority: Major > Fix For: 1.18.0 > > > When starting master, this error randomly happens on my system: > {code:java} > + exec > /opt/cloudera/parcels/CDH-7.2.18-1.cdh7.2.18.p0.43161468/lib/kudu/sbin/kudu-master > > --master_addresses=abakai-1.abakai.root.hwx.site,abakai-2.abakai.root.hwx.site,abakai-3.abakai.root.hwx.site > > --location_mapping_cmd=/var/run/cloudera-scm-agent/process/126-kudu-KUDU_MASTER/topology.py > --flagfile=/var/run/cloudera-scm-agent/process/126-kudu-KUDU_MASTER/gflagfile > F20230717 10:37:23.626719 101405 minidump.cc:273] Check failed: 0 == > MinidumpExceptionHandler::current_num_instances_.fetch_add(1) (0 vs. 1) > *** Check failure stack trace: *** > Wrote minidump to > /var/log/kudu/minidumps/kudu-master/1664cde8-f41a-4b7f-121578b0-55ba68a6.dmp > *** Aborted at 1689590243 (unix time) try "date -d @1689590243" if you are > using GNU date *** > PC: @ 0x0 (unknown) > *** SIGABRT (@0x9c2900018c1d) received by PID 101405 (TID 0x7f204a7cda00) > from PID 101405; stack trace: *** > @ 0xe4df76 google::(anonymous namespace)::FailureSignalHandler() > @ 0x7f204a19e6d0 (unknown) > @ 0x7f20483a6277 __GI_raise > @ 0x7f20483a7968 __GI_abort > @ 0xcb50af kudu::AbortFailureFunction() > @ 0xe4296d google::LogMessage::Fail() > @ 0xe4584a google::LogMessage::SendToLog() > @ 0xe4249e google::LogMessage::Flush() > @ 0xe439d9 google::LogMessageFatal::~LogMessageFatal() > @ 0x319a0a1 > kudu::MinidumpExceptionHandler::RegisterMinidumpExceptionHandler() > @ 0x319a107 > kudu::MinidumpExceptionHandler::MinidumpExceptionHandler() > @ 0x12821f0 kudu::server::ServerBase::ServerBase() > @ 0x124577e kudu::kserver::KuduServer::KuduServer() > @ 0xddc9ce kudu::master::Master::Master() > @ 0xd5732b kudu::master::RunMasterServer() > @ 0xd51d8a kudu::master::MasterMain() > @ 0x7f2048392445 __libc_start_main > @ 0xd51b64 (unknown) {code} > Version information: > {code:java} > [root@abakai-1 ~]# > /opt/cloudera/parcels/CDH-7.2.18-1.cdh7.2.18.p0.43161468/lib/kudu/sbin-release/kudu-master > -version > kudu 1.15.0.7.2.18.0-205 > revision 7e222133b1a13ce6c212ffb32d8ceaa0c6a8545a > build type RELEASE > built by None at 14 Jul 2023 06:51:13 UTC on re-centos-slave-large-2wczs > build id 1178567 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)