On 09 Sep 2015, at 01:51, Barry Quiel <barry.qu...@gmail.com<mailto:barry.qu...@gmail.com>> wrote:
I've been working on an HA setup and had a lot of difficulties. Most of them self-induced. I worked through them all but 1. I seem to have a database lock issue on the icinga_customvariables table. I've got 2 machines in an HA pair. hostA is the "config master". When I start hostA everything looks good. I give it about 5 minutes to stabilize and then start hostB. I can see though the cluster check and the logs that they are communicating. At some point into the "startup" process of hostB ( usually a few minutes ) I start getting the following error in the logs: critical/IdoMysqlConnection: Error "Lock wait timeout exceeded; try restarting transaction" when executing query "DELETE FROM icinga_customvariables WHERE instance_id = 1" When these messages start spamming the log the Icinga process on hostB the icinga process basically stops trying to do anything else except for this update. At some point the situation flip flops and somehow hostB becomes the cluster idodb master and then hostA gets stuck on the delete. It bounces back and forth for a while ( about 24hrs ) and then finally they are both stuck. I figure I'm doing something wrong. I don't see any recent posts or bugs about this issue. icinga 2.3.9 mysql 5.1.73 hostA config bits: constants.conf: const NodeName = "hostA" /* Our local zone name. */ const ZoneName = NodeName zones.conf: object Endpoint "hostA" { host = "hostA" } object Endpoint "hostB" { host = "hostB" } object Zone "dc-master" { endpoints = [ "hostA", "hostB" ] } object Zone "global-config" { global = true } ido-mysql.conf: object IdoMysqlConnection "ido-mysql" { user = "icinga2" password = "icinga" host = "hostA" database = "icinga2" } hostB config bits: constants.conf: const NodeName = "hostA" /* Our local zone name. */ const ZoneName = NodeName zones.conf: object Endpoint "hostA" { host = "hostA" } object Endpoint "hostB" { host = "hostB" } object Zone "dc-master" { endpoints = [ "hostA", "hostB" ] } object Zone "global-config" { global = true } id-mysql.conf: object IdoMysqlConnection "ido-mysql" { user = "icinga2" password = "icinga" host = "hostA" database = "icinga2" } mysql instances on both hostA and hostB in a master-master replication setup. Both hostA and hostB connect to the same database on the same host ( hostA ). Eventually the mysql connection will live behind a load balancer with affinity settings. I can see that hostA has an open connection to the database after it finishes it's startup process. I'm assuming it is electing itself master of the ido-mysql role since the other node is not running. When hostB starts up it also connects to the database and tries to go through it's startup process. If I had to guess I would say that it doesn't force an election for the ido-mysql role prior to doing its "startup" database stuff. Or hostB doesn't honor the results of an election. The other possibility is that hostA isn't properly releasing its locks on the tables after it is done with them. Sounds a bit like this issue: https://dev.icinga.org/issues/8714 Kind regards, Michael During this back and forth of who is the database master a show open tables only shows the icinga_customvariables table in_use. However a show innodb status shows 100+ lock structs, so something is creating locks. If you let it run long enough mysql will detect deadlocks, but for other tables then the one that starts this downward spiral: ------------------------ LATEST DETECTED DEADLOCK ------------------------ 150908 16:33:17 *** (1) TRANSACTION: TRANSACTION 0 212950, ACTIVE 23 sec, process no 7121, OS thread id 140350274770688 starting index read mysql tables in use 1, locked 1 LOCK WAIT 481 lock struct(s), heap size 63472, 4398 row lock(s), undo log entries 2223 MySQL thread id 258, query id 7678772 hostA 192.168.162.50 icinga2 Updating UPDATE icinga_endpointstatus SET is_connected = '1', status_update_time = FROM_UNIXTIME(1441755026) WHERE endpoint_object_id = 51327 AND instance_id = 1 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 7180 n bits 72 index `PRIMARY` of table `icinga2`.`icinga_endpointstatus` trx id 0 212950 lock_mode X waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 9; compact format; info bits 0 0: len 8; hex 00000000000009a9; asc ;; 1: len 6; hex 00000000038b; asc ;; 2: len 7; hex 800000002d0110; asc - ;; 3: len 8; hex 0000000000000001; asc ;; 4: len 8; hex 0000000 000001c57; asc W;; 5: len 4; hex 55dc8422; asc U ";; 6: len 30; hex 6672632d6e6574746f6f6c7330312e74686566616365626f6f6b2e636f6d; asc hostA;; 7: len 30; hex 6672632d6e6 574746f6f6c7330312e74686566616365626f6f6b2e636f6d; aschostA;; 8: len 2; hex 8001; asc ;; *** (2) TRANSACTION: TRANSACTION 0 212800, ACTIVE 67 sec, process no 7121, OS thread id 140350274238208 starting index read, thread declared inside InnoDB 500 mysql tables in use 1, locked 1 72 lock struct(s), heap size 14320, 493 row lock(s), undo log entries 665 MySQL thread id 261, query id 7679077 hostB 192.168.40.50 icinga2 Updating UPDATE icinga_hoststatus SET acknowledgement_type = '0', active_checks_enabled = '1', check_command = 'hostalive', check_source = 'hostA', check_timeperiod_object_id = 27 , check_type = '0', current_check_attempt = '1', current_notification_number = '0', current_state = '0', endpoint_object_id = 51327, event_handler = '', event_handler_enabled = '1', execution_t ime = '4.2099208831787109', flap_detection_enabled = '1', has_been_checked = '1', host_object_id = 11941, instance_id = 1, is_flapping = '0', is_reachable = '1', last_check = FROM_UNIXTIME(14417 55129), last_hard_state = '0', last_hard_state_change = FROM_UNIXTIME(1441753220), last_notification = FROM_UNIXTIME(1441500834), last_state_change = FROM_UNIXTIME(1441753212), last_time_down = FR OM_UNIXTIME(1441500120), last_time_up = FROM_UNIXTIME(1441755144), latency = '0', long_output = '', max_check_att *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 0 page no 7180 n bits 72 index `PRIMARY` of table `icinga2`.`icinga_endpointstatus` trx id 0 212800 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 2 PHYSICAL RECORD: n_fields 9; compact format; info bits 0 0: len 8; hex 00000000000009a9; asc ;; 1: len 6; hex 00000000038b; asc ;; 2: len 7; hex 800000002d0110; asc - ;; 3: len 8; hex 0000000000000001; asc ;; 4: len 8; hex 0000000 000001c57; asc W;; 5: len 4; hex 55dc8422; asc U ";; 6: len 30; hex 6672632d6e6574746f6f6c7330312e74686566616365626f6f6b2e636f6d; asc hostA;; 7: len 30; hex 6672632d6e6 574746f6f6c7330312e74686566616365626f6f6b2e636f6d; asc hostA;; 8: len 2; hex 8001; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 9; compact format; info bits 0 0: len 8; hex 00000000000009ab; asc ;; 1: len 6; hex 000000033f40; asc ?@;; 2: len 7; hex 000000002d01d0; asc - ;; 3: len 8; hex 0000000000000001; asc ;; 4: len 8; hex 0000000 00000c87f; asc ;; 5: len 4; hex 55ef6ffa; asc U o ;; 6: len 30; hex 70726e2d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc hostB;...(truncated); 7: len 30; h ex 70726e2d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc hostB;...(truncated); 8: len 2; hex 8001; asc ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 9; compact format; info bits 0 0: len 8; hex 00000000000009ad; asc ;; 1: len 6; hex 000000033f40; asc ?@;; 2: len 7; hex 000000002d014e; asc - N;; 3: len 8; hex 0000000000000001; asc ;; 4: len 8; hex 0000000 00000c881; asc ;; 5: len 4; hex 55ef6ffa; asc U o ;; 6: len 30; hex 6672632d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc hostA;...(truncated); 7: len 30; h ex 70726e2d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc hostB;...(truncated); 8: len 2; hex 8001; asc ;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 4182 n bits 792 index `object_id` of table `icinga2`.`icinga_hoststatus` trx id 0 212800 lock_mode X locks rec but not gap waiting Record lock, heap no 25 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 8; hex 0000000000002ea5; asc . ;; 1: len 8; hex 000000000000072e; asc .;; _______________________________________________ icinga-users mailing list icinga-users@lists.icinga.org<mailto:icinga-users@lists.icinga.org> https://lists.icinga.org/mailman/listinfo/icinga-users -- Michael Friedrich, DI (FH) Senior Developer NETWAYS GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg Tel: +49 911 92885-0 | Fax: +49 911 92885-77 GF: Julian Hein, Bernd Erk | AG Nuernberg HRB18461 http://www.netways.de | michael.friedr...@netways.de ** OSBConf 2015 - September - osbconf.org ** ** OSMC 2015 - November - netways.de/osmc ** _______________________________________________ icinga-users mailing list icinga-users@lists.icinga.org https://lists.icinga.org/mailman/listinfo/icinga-users