Hi Aaron, On 03/28/2017 10:03 PM, Aaron van Meerten wrote: > Hi HAProxy List, > > I posted the following approximately 2 weeks ago and was hoping that someone > else might have experienced these inconsistencies within the stick tables > between peers. It seems to be an issue even in the latest release (HAProxy > 1.7.4). I hope to get some guidance on what information I could collect > which might be of interest to the developers or the community. > > Would a tcpdump of the chatter between peers (on TCP port 1024) be of use? I > cannot always predict when the stick table corruption will occur, but I can > try to collect some data about the traffic between the peers once the > corruption has happened. > > Or is there anything else I could be doing to increase the logging with > relation to peer connections and stick table updates? At the moment I don’t > see anything in the HAProxy logs related to this feature. > > Thanks again for this amazing, product, I’m still a very happy user! > > Cheers, > > -Aaron > > >> On Mar 15, 2017, at 22:22, Aaron van Meerten <[email protected]> >> wrote: >> >> Hi HAProxy List, >> >> I’ve run into an issue with the stick tables/peering issue that may be of >> interest to some of you. >> >> I’ve got a fleet of 10 proxy servers peering with each other, fronting >> several backend servers. I have a very simple stick table setup which I’ve >> pasted examples of below. Basically I use a URL parameter to control server >> stickiness. >> >> This works great, and is an amazing solution to a sticky problem for our >> BOSH-based XMPP messaging, as long as the stick table entries stay in sync. >> However, sometimes one HAProxy instance will lose one or more entries which >> are still present on the others.
Is it still the same instance? >> This state persists between minutes and hours, in which the out-of-sync >> server continues to receive updates on some entries but is missing others. In peers protocol, a peer is responsible to push its local updates to the other peers. But A peer won't 'forward' updates coming from an other peer (except for a startup resync request). So we could reach your case if communication failed between 2 peers (the peer learns the updates from all the peers except one). >> A restart of the server can resolve the issue by causing the table to >> refresh, but this is less than ideal. At restart, the node will ask for a re-sync to any available peer. >> >> When it occurs, it appears that all the other servers continue to update the >> “TTL” on the entry, but the errant server slowly allows the entry to expire >> and be removed. >> I have developed a tool which pulls the stick table from each proxy and >> compares the entries. There’s obviously some room for expiry times to be >> different on each proxy, but I’d expect that entries which are regularly >> refreshed on all other peers should be propagated everywhere. >> >> I suspect somehow either ephemeral network connectivity between the peers or >> some other error, but I haven’t seen anything in the logs that seem >> relevant. >> >> lsof analysis of open TCP sockets shows all peers connected on 1024 as >> expected. >> When you are facing the issue, could you launch a tcpdump between this instance and ALL the other peers, to check if they exchange some data. >> I wondered if this list would have any ideas on further avenues for analysis >> on this particular problem. I’ve seen this happen consistently on HAProxy >> 1.6 and 1.7 through several point releases of each. If anything it seems >> more frequent in 1.7. >> >> Please let me know if you have any good ideas or if anyone has seen behavior >> like this before. >> >> Thanks, >> >> -Aaron van Meerten R, Emeric

