[ https://issues.apache.org/jira/browse/KUDU-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-2987: -------------------------------- Affects Version/s: 1.9.0 1.10.0 1.10.1 > Intra location rebalance will crash in special case > --------------------------------------------------- > > Key: KUDU-2987 > URL: https://issues.apache.org/jira/browse/KUDU-2987 > Project: Kudu > Issue Type: Bug > Components: CLI > Affects Versions: 1.9.0, 1.10.0, 1.10.1, 1.11.0 > Reporter: ZhangYao > Assignee: ZhangYao > Priority: Major > Fix For: 1.12.0, 1.11.1 > > > Recently I am doing POC about rebalance and I get core when running intra > location rebalance. > Here is the log: > {code:java} > I2019-10-30 20:02:17.843044 40915 rebalancer_tool.cc:225] running rebalancer > within location '/location/2044' > F2019-10-30 20:02:17.884591 40915 map-util.h:109] Check failed: it != > collection.end() Map key not found: a9119004b2d24f42a1acf09d142565fb > *** Check failure stack trace: *** > @ 0x111a75d google::LogMessage::Fail() > @ 0x111c6d3 google::LogMessage::SendToLog() > @ 0x111a2b9 google::LogMessage::Flush() > @ 0x111d0ef google::LogMessageFatal::~LogMessageFatal() > @ 0xe26da7 FindOrDie<>() > @ 0xe1f204 > kudu::tools::RebalancerTool::AlgoBasedRunner::GetNextMovesImpl() > @ 0xe162e0 > kudu::tools::RebalancerTool::BaseRunner::GetNextMoves() > @ 0xe15bf5 kudu::tools::RebalancerTool::RunWith() > @ 0xe1db0e kudu::tools::RebalancerTool::Run() > @ 0xb6fea1 kudu::tools::(anonymous namespace)::RunRebalance() > @ 0xb70e14 std::_Function_handler<>::_M_invoke() > @ 0x11714a2 kudu::tools::Action::Run() > @ 0xc00587 kudu::tools::DispatchCommand() > @ 0xc00f4b kudu::tools::RunTool() > @ 0xb0fd6d main > @ 0x7f37086a4b15 __libc_start_main > @ 0xb6b399 (unknown) > {code} > I found it may be the problem in > {{RebalancerTool::AlgoBasedRunner::GetNextMovesImpl}} when building > extra_info_by_tablet_id, it check that the table id in tablet must occur in > table info. But when we build ClusterRawInfo in > {{RebalancerTool::KsckResultsToClusterRawInfo}} we only collect the table > occurs in location but all tablets in cluster. > This problem will occur when the location doesn't have replica for all > table. When location is far more than table's replica it will happen. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)