I've been working on an HA setup and had a lot of difficulties. Most of
them self-induced. I worked through them all but 1. I seem to have a
database lock issue on the icinga_customvariables table.
I've got 2 machines in an HA pair. hostA is the "config master". When I
start hostA everything looks good. I give it about 5 minutes to
stabilize and then start hostB. I can see though the cluster check and
the logs that they are communicating. At some point into the "startup"
process of hostB ( usually a few minutes ) I start getting the following
error in the logs:
critical/IdoMysqlConnection: Error "Lock wait timeout exceeded; try
restarting transaction" when executing query "DELETE FROM
icinga_customvariables WHERE instance_id = 1"
When these messages start spamming the log the Icinga process on hostB
the icinga process basically stops trying to do anything else except for
this update. At some point the situation flip flops and somehow hostB
becomes the cluster idodb master and then hostA gets stuck on the
delete. It bounces back and forth for a while ( about 24hrs ) and then
finally they are both stuck. I figure I'm doing something wrong. I
don't see any recent posts or bugs about this issue.
icinga 2.3.9
mysql 5.1.73
hostA config bits:
constants.conf:
const NodeName = "hostA"
/* Our local zone name. */
const ZoneName = NodeName
zones.conf:
object Endpoint "hostA" {
host = "hostA"
}
object Endpoint "hostB" {
host = "hostB"
}
object Zone "dc-master" {
endpoints = [ "hostA", "hostB" ]
}
object Zone "global-config" {
global = true
}
ido-mysql.conf:
object IdoMysqlConnection "ido-mysql" {
user = "icinga2"
password = "icinga"
host = "hostA"
database = "icinga2"
}
hostB config bits:
constants.conf:
const NodeName = "hostA"
/* Our local zone name. */
const ZoneName = NodeName
zones.conf:
object Endpoint "hostA" {
host = "hostA"
}
object Endpoint "hostB" {
host = "hostB"
}
object Zone "dc-master" {
endpoints = [ "hostA", "hostB" ]
}
object Zone "global-config" {
global = true
}
id-mysql.conf:
object IdoMysqlConnection "ido-mysql" {
user = "icinga2"
password = "icinga"
host = "hostA"
database = "icinga2"
}
mysql instances on both hostA and hostB in a master-master replication
setup. Both hostA and hostB connect to the same database on the same
host ( hostA ). Eventually the mysql connection will live behind a load
balancer with affinity settings.
I can see that hostA has an open connection to the database after it
finishes it's startup process. I'm assuming it is electing itself
master of the ido-mysql role since the other node is not running. When
hostB starts up it also connects to the database and tries to go through
it's startup process. If I had to guess I would say that it doesn't
force an election for the ido-mysql role prior to doing its "startup"
database stuff. Or hostB doesn't honor the results of an election. The
other possibility is that hostA isn't properly releasing its locks on
the tables after it is done with them.
During this back and forth of who is the database master a show open
tables only shows the icinga_customvariables table in_use. However a
show innodb status shows 100+ lock structs, so something is creating
locks. If you let it run long enough mysql will detect deadlocks, but
for other tables then the one that starts this downward spiral:
------------------------
LATEST DETECTED DEADLOCK
------------------------
150908 16:33:17
*** (1) TRANSACTION:
TRANSACTION 0 212950, ACTIVE 23 sec, process no 7121, OS thread id
140350274770688 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 481 lock struct(s), heap size 63472, 4398 row lock(s), undo
log entries 2223
MySQL thread id 258, query id 7678772 hostA 192.168.162.50 icinga2 Updating
UPDATE icinga_endpointstatus SET is_connected = '1', status_update_time
= FROM_UNIXTIME(1441755026) WHERE endpoint_object_id = 51327 AND
instance_id = 1
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 7180 n bits 72 index `PRIMARY` of table
`icinga2`.`icinga_endpointstatus` trx id 0 212950 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 9; compact format; info
bits 0
0: len 8; hex 00000000000009a9; asc ;; 1: len 6; hex
00000000038b; asc ;; 2: len 7; hex 800000002d0110; asc - ;; 3:
len 8; hex 0000000000000001; asc ;; 4: len 8; hex 0000000
000001c57; asc W;; 5: len 4; hex 55dc8422; asc U ";; 6: len 30;
hex 6672632d6e6574746f6f6c7330312e74686566616365626f6f6b2e636f6d; asc
hostA;; 7: len 30; hex 6672632d6e6
574746f6f6c7330312e74686566616365626f6f6b2e636f6d; aschostA;; 8: len 2;
hex 8001; asc ;;
*** (2) TRANSACTION:
TRANSACTION 0 212800, ACTIVE 67 sec, process no 7121, OS thread id
140350274238208 starting index read, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
72 lock struct(s), heap size 14320, 493 row lock(s), undo log entries 665
MySQL thread id 261, query id 7679077 hostB 192.168.40.50 icinga2 Updating
UPDATE icinga_hoststatus SET acknowledgement_type = '0',
active_checks_enabled = '1', check_command = 'hostalive', check_source
= 'hostA', check_timeperiod_object_id = 27 , check_type = '0',
current_check_attempt = '1', current_notification_number = '0',
current_state = '0', endpoint_object_id = 51327, event_handler = '',
event_handler_enabled = '1', execution_t ime = '4.2099208831787109',
flap_detection_enabled = '1', has_been_checked = '1', host_object_id =
11941, instance_id = 1, is_flapping = '0', is_reachable = '1',
last_check = FROM_UNIXTIME(14417 55129), last_hard_state = '0',
last_hard_state_change = FROM_UNIXTIME(1441753220), last_notification =
FROM_UNIXTIME(1441500834), last_state_change =
FROM_UNIXTIME(1441753212), last_time_down = FR
OM_UNIXTIME(1441500120), last_time_up = FROM_UNIXTIME(1441755144),
latency = '0', long_output = '', max_check_att
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 7180 n bits 72 index `PRIMARY` of table
`icinga2`.`icinga_endpointstatus` trx id 0 212800 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info
bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 2 PHYSICAL RECORD: n_fields 9; compact format; info
bits 0
0: len 8; hex 00000000000009a9; asc ;; 1: len 6; hex
00000000038b; asc ;; 2: len 7; hex 800000002d0110; asc - ;; 3:
len 8; hex 0000000000000001; asc ;; 4: len 8; hex 0000000
000001c57; asc W;; 5: len 4; hex 55dc8422; asc U ";; 6: len 30;
hex 6672632d6e6574746f6f6c7330312e74686566616365626f6f6b2e636f6d; asc
hostA;; 7: len 30; hex 6672632d6e6
574746f6f6c7330312e74686566616365626f6f6b2e636f6d; asc hostA;; 8: len 2;
hex 8001; asc ;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 9; compact format; info
bits 0
0: len 8; hex 00000000000009ab; asc ;; 1: len 6; hex
000000033f40; asc ?@;; 2: len 7; hex 000000002d01d0; asc - ;; 3:
len 8; hex 0000000000000001; asc ;; 4: len 8; hex 0000000
00000c87f; asc ;; 5: len 4; hex 55ef6ffa; asc U o ;; 6: len 30;
hex 70726e2d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc
hostB;...(truncated); 7: len 30; h ex
70726e2d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc
hostB;...(truncated); 8: len 2; hex 8001; asc ;;
Record lock, heap no 4 PHYSICAL RECORD: n_fields 9; compact format; info
bits 0
0: len 8; hex 00000000000009ad; asc ;; 1: len 6; hex
000000033f40; asc ?@;; 2: len 7; hex 000000002d014e; asc - N;;
3: len 8; hex 0000000000000001; asc ;; 4: len 8; hex 0000000
00000c881; asc ;; 5: len 4; hex 55ef6ffa; asc U o ;; 6: len 30;
hex 6672632d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc
hostA;...(truncated); 7: len 30; h ex
70726e2d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc
hostB;...(truncated); 8: len 2; hex 8001; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 4182 n bits 792 index `object_id` of
table `icinga2`.`icinga_hoststatus` trx id 0 212800 lock_mode X locks
rec but not gap waiting
Record lock, heap no 25 PHYSICAL RECORD: n_fields 2; compact format;
info bits 0
0: len 8; hex 0000000000002ea5; asc . ;; 1: len 8; hex
000000000000072e; asc .;;
_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users