Hi Aaron,

On 03/28/2017 10:03 PM, Aaron van Meerten wrote:
> Hi HAProxy List,
> 
> I posted the following approximately 2 weeks ago and was hoping that someone 
> else might have experienced these inconsistencies within the stick tables 
> between peers.  It seems to be an issue even in the latest release (HAProxy 
> 1.7.4).  I hope to get some guidance on what information I could collect 
> which might be of interest to the developers or the community.
> 
> Would a tcpdump of the chatter between peers (on TCP port 1024) be of use?  I 
> cannot always predict when the stick table corruption will occur, but I can 
> try to collect some data about the traffic between the peers once the 
> corruption has happened.
> 
> Or is there anything else I could be doing to increase the logging with 
> relation to peer connections and stick table updates? At the moment I don’t 
> see anything in the HAProxy logs related to this feature.
> 
> Thanks again for this amazing, product, I’m still a very happy user!
> 
> Cheers,
> 
> -Aaron
> 
> 
>> On Mar 15, 2017, at 22:22, Aaron van Meerten <[email protected]> 
>> wrote:
>>
>> Hi HAProxy List,
>>
>> I’ve run into an issue with the stick tables/peering issue that may be of 
>> interest to some of you.
>>
>> I’ve got a fleet of 10 proxy servers peering with each other, fronting 
>> several backend servers.  I have a very simple stick table setup which I’ve 
>> pasted examples of below.  Basically I use a URL parameter to control server 
>> stickiness.
>>
>> This works great, and is an amazing solution to a sticky problem for our 
>> BOSH-based XMPP messaging, as long as the stick table entries stay in sync.  
>> However, sometimes one HAProxy instance will lose one or more entries which 
>> are still present on the others.

Is it still the same instance?

>> This state persists between minutes and hours, in which the out-of-sync 
>> server continues to receive updates on some entries but is missing others.

In peers protocol, a peer is responsible to push its local updates to the other 
peers.  But A peer won't 'forward' updates coming from an other peer (except 
for a startup resync request).

So we could reach your case if communication failed between 2 peers (the peer 
learns the updates from all the peers except one).

>> A restart of the server can resolve the issue by causing the table to 
>> refresh, but this is less than ideal.

At restart, the node will ask for a re-sync to any available peer.
>>
>> When it occurs, it appears that all the other servers continue to update the 
>> “TTL” on the entry, but the errant server slowly allows the entry to expire 
>> and be removed.
>> I have developed a tool which pulls the stick table from each proxy and 
>> compares the entries.  There’s obviously some room for expiry times to be 
>> different on each proxy, but I’d expect that entries which are regularly 
>> refreshed on all other peers should be propagated everywhere.
>>
>> I suspect somehow either ephemeral network connectivity between the peers or 
>> some other error, but I haven’t seen anything in the logs that seem 
>> relevant.  
>>
>> lsof analysis of open TCP sockets shows all peers connected on 1024 as 
>> expected.
>>

When you are facing the issue, could you launch a tcpdump between this instance 
and ALL the other peers, to check if they exchange some data.

>> I wondered if this list would have any ideas on further avenues for analysis 
>> on this particular problem.  I’ve seen this happen consistently on HAProxy 
>> 1.6 and 1.7 through several point releases of each.  If anything it seems 
>> more frequent in 1.7.
>>
>> Please let me know if you have any good ideas or if anyone has seen behavior 
>> like this before. 
>>
>> Thanks,
>>
>> -Aaron van Meerten

R,
Emeric


Reply via email to