I sent this last week via Nabble but I don't think it got mailed out to anybody, if it did I apologize for the spam.
We're developing a strategy for backups and HA. Ideally we'd like to use colocated backups to ensure data integrity and availability with scale-down configured from the slave to the master host. We ran into an issue when bringing a server back up, consider this situation. Servers 1 and 2 are brought up, and make colocated backups 1b and 2b. 1b existing on server 2 and 2b existing on server 1. If I bring server 2 offline, 2b comes online then scales down into server 1 as intended. When I bring server 2 back up, 2b does not failback. This leads to server 2 starting an infinite vote loop to find another server to create a backup for it. Since server 1 already possesses backup 2b and is only configured for 1 backup it will infinitely reply that it does not have space for another backup. In this state, if more messages are sent to server 2 and server 2 experiences a crash those messages are lost. I've created an example of this problem based on one of the examples in the artemis source here https://github.com/SethPyle376/colocated-scaledown-problem I've tested this situation with both replication and shared-store and the problem persists. Any help would be great, we need colocated scaledown failback working correctly.