On 09 Sep 2015, at 01:51, Barry Quiel 
<barry.qu...@gmail.com<mailto:barry.qu...@gmail.com>> wrote:

I've been working on an HA setup and had a lot of difficulties. Most of them 
self-induced.  I worked through them all but 1.  I seem to have a database lock 
issue on the icinga_customvariables table.

I've got 2 machines in an HA pair.  hostA is the "config master". When I start 
hostA everything looks good.  I give it about 5 minutes to stabilize and then 
start hostB.  I can see though the cluster check and the logs that they are 
communicating.  At some point into the "startup" process of hostB ( usually a 
few minutes ) I start getting the following error in the logs:
critical/IdoMysqlConnection: Error "Lock wait timeout exceeded; try restarting 
transaction" when executing query "DELETE FROM icinga_customvariables WHERE 
instance_id = 1"

When these messages start spamming the log the Icinga process on hostB the 
icinga process basically stops trying to do anything else except for this 
update.  At some point the situation flip flops and somehow hostB becomes the 
cluster idodb master and then hostA gets stuck on the delete.  It bounces back 
and forth for a while ( about 24hrs ) and then finally they are both stuck.  I 
figure I'm doing something wrong.  I don't see any recent posts or bugs about 
this issue.

icinga 2.3.9
mysql 5.1.73

hostA config bits:

constants.conf:
const NodeName = "hostA"

/* Our local zone name. */
const ZoneName = NodeName

zones.conf:
object Endpoint "hostA" {
  host = "hostA"
  }

object Endpoint "hostB" {
  host = "hostB"
  }

object Zone "dc-master" {
  endpoints = [ "hostA", "hostB" ]
  }

object Zone "global-config" {
 global = true
}

ido-mysql.conf:
object IdoMysqlConnection "ido-mysql" {
 user = "icinga2"
 password = "icinga"
 host = "hostA"
 database = "icinga2"
}

hostB config bits:

constants.conf:
const NodeName = "hostA"

/* Our local zone name. */
const ZoneName = NodeName

zones.conf:
object Endpoint "hostA" {
  host = "hostA"
  }

object Endpoint "hostB" {
  host = "hostB"
  }

object Zone "dc-master" {
  endpoints = [ "hostA", "hostB" ]
  }

object Zone "global-config" {
 global = true
}

id-mysql.conf:
object IdoMysqlConnection "ido-mysql" {
 user = "icinga2"
 password = "icinga"
 host = "hostA"
 database = "icinga2"
}


mysql instances on both hostA and hostB in a master-master replication setup.  
Both hostA and hostB connect to the same database on the same host ( hostA ).  
Eventually the mysql connection will live behind a load balancer with affinity 
settings.


I can see that hostA has an open connection to the database after it finishes 
it's startup process.  I'm assuming it is electing itself master of the 
ido-mysql role since the other node is not running. When hostB starts up it 
also connects to the database and tries to go through it's startup process.  If 
I had to guess I would say that it doesn't force an election for the ido-mysql 
role prior to doing its "startup" database stuff. Or hostB doesn't honor the 
results of an election.  The other possibility is that hostA isn't properly 
releasing its locks on the tables after it is done with them.

Sounds a bit like this issue: https://dev.icinga.org/issues/8714

Kind regards,
Michael


During this back and forth of who is the database master a show open tables 
only shows the icinga_customvariables table in_use.  However a show innodb 
status shows 100+ lock structs, so something is creating locks.  If you let it 
run long enough mysql will detect deadlocks, but for other tables then the one 
that starts this downward spiral:
------------------------
LATEST DETECTED DEADLOCK
------------------------
150908 16:33:17
*** (1) TRANSACTION:
TRANSACTION 0 212950, ACTIVE 23 sec, process no 7121, OS thread id 
140350274770688 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 481 lock struct(s), heap size 63472, 4398 row lock(s), undo log 
entries 2223
MySQL thread id 258, query id 7678772 hostA 192.168.162.50 icinga2 Updating
UPDATE icinga_endpointstatus SET is_connected = '1', status_update_time = 
FROM_UNIXTIME(1441755026) WHERE endpoint_object_id = 51327 AND instance_id = 1
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 7180 n bits 72 index `PRIMARY` of table 
`icinga2`.`icinga_endpointstatus` trx id 0 212950 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 9; compact format; info bits 0
0: len 8; hex 00000000000009a9; asc         ;; 1: len 6; hex 00000000038b; asc  
     ;; 2: len 7; hex 800000002d0110; asc     - ;; 3: len 8; hex 
0000000000000001; asc         ;; 4: len 8; hex 0000000 000001c57; asc        
W;; 5: len 4; hex 55dc8422; asc U  ";; 6: len 30; hex 
6672632d6e6574746f6f6c7330312e74686566616365626f6f6b2e636f6d; asc hostA;; 7: 
len 30; hex 6672632d6e6 574746f6f6c7330312e74686566616365626f6f6b2e636f6d; 
aschostA;; 8: len 2; hex 8001; asc   ;;

*** (2) TRANSACTION:
TRANSACTION 0 212800, ACTIVE 67 sec, process no 7121, OS thread id 
140350274238208 starting index read, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
72 lock struct(s), heap size 14320, 493 row lock(s), undo log entries 665
MySQL thread id 261, query id 7679077 hostB 192.168.40.50 icinga2 Updating
UPDATE icinga_hoststatus SET acknowledgement_type = '0', active_checks_enabled 
= '1',  check_command = 'hostalive', check_source = 'hostA',  
check_timeperiod_object_id = 27 ,  check_type = '0',  current_check_attempt = 
'1', current_notification_number = '0',  current_state = '0', 
endpoint_object_id = 51327,  event_handler = '', event_handler_enabled = '1', 
execution_t ime = '4.2099208831787109',  flap_detection_enabled = '1', 
has_been_checked = '1',  host_object_id = 11941,  instance_id = 1, is_flapping 
= '0',  is_reachable = '1',  last_check = FROM_UNIXTIME(14417 55129),  
last_hard_state = '0',  last_hard_state_change = FROM_UNIXTIME(1441753220),  
last_notification = FROM_UNIXTIME(1441500834),  last_state_change = 
FROM_UNIXTIME(1441753212),  last_time_down = FR OM_UNIXTIME(1441500120),  
last_time_up = FROM_UNIXTIME(1441755144), latency = '0',  long_output = '',  
max_check_att
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 7180 n bits 72 index `PRIMARY` of table 
`icinga2`.`icinga_endpointstatus` trx id 0 212800 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 9; compact format; info bits 0
0: len 8; hex 00000000000009a9; asc         ;; 1: len 6; hex 00000000038b; asc  
     ;; 2: len 7; hex 800000002d0110; asc     - ;; 3: len 8; hex 
0000000000000001; asc         ;; 4: len 8; hex 0000000 000001c57; asc        
W;; 5: len 4; hex 55dc8422; asc U  ";; 6: len 30; hex 
6672632d6e6574746f6f6c7330312e74686566616365626f6f6b2e636f6d; asc hostA;; 7: 
len 30; hex 6672632d6e6 574746f6f6c7330312e74686566616365626f6f6b2e636f6d; asc 
hostA;; 8: len 2; hex 8001; asc   ;;

Record lock, heap no 3 PHYSICAL RECORD: n_fields 9; compact format; info bits 0
0: len 8; hex 00000000000009ab; asc         ;; 1: len 6; hex 000000033f40; asc  
   ?@;; 2: len 7; hex 000000002d01d0; asc     - ;; 3: len 8; hex 
0000000000000001; asc         ;; 4: len 8; hex 0000000 00000c87f; asc         
;; 5: len 4; hex 55ef6ffa; asc U o ;; 6: len 30; hex 
70726e2d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc 
hostB;...(truncated); 7: len 30; h ex 
70726e2d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc 
hostB;...(truncated); 8: len 2; hex 8001; asc   ;;

Record lock, heap no 4 PHYSICAL RECORD: n_fields 9; compact format; info bits 0
0: len 8; hex 00000000000009ad; asc         ;; 1: len 6; hex 000000033f40; asc  
   ?@;; 2: len 7; hex 000000002d014e; asc     - N;; 3: len 8; hex 
0000000000000001; asc         ;; 4: len 8; hex 0000000 00000c881; asc         
;; 5: len 4; hex 55ef6ffa; asc U o ;; 6: len 30; hex 
6672632d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc 
hostA;...(truncated); 7: len 30; h ex 
70726e2d6e65746d6f6e69746f7230312e74686566616365626f6f6b2e63; asc 
hostB;...(truncated); 8: len 2; hex 8001; asc   ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 4182 n bits 792 index `object_id` of table 
`icinga2`.`icinga_hoststatus` trx id 0 212800 lock_mode X locks rec but not gap 
waiting
Record lock, heap no 25 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 0000000000002ea5; asc       . ;; 1: len 8; hex 000000000000072e; 
asc        .;;

_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org<mailto:icinga-users@lists.icinga.org>
https://lists.icinga.org/mailman/listinfo/icinga-users


-- 
Michael Friedrich, DI (FH)
Senior Developer

NETWAYS GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg
Tel: +49 911 92885-0 | Fax: +49 911 92885-77
GF: Julian Hein, Bernd Erk | AG Nuernberg HRB18461
http://www.netways.de | michael.friedr...@netways.de

** OSBConf 2015 - September - osbconf.org **
** OSMC 2015 - November - netways.de/osmc **
_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to