Re: [Linux-HA] drbd/pacemaker multiple tgt targets, portblock, and race conditions (long-ish)

Vladislav Bogdanov Tue, 12 Nov 2013 22:04:03 -0800

13.11.2013 04:46, Jefferson Ogata wrote:
...
> 
> In practice i ran into failover problems under load almost immediately.
> Under load, when i would initiate a failover, there was a race
> condition: the iSCSILogicalUnit RA will take down the LUNs one at a
> time, waiting for each connection to terminate, and if the initiators
> reconnect quickly enough, they get pissed off at finding that the target
> still exists but the LUN they were using no longer does, which is often
> the case during this transient takedown process. On the initiator, it
> looks something like this, and it's fatal (here LUN 4 has gone away but
> the target is still alive, maybe working on disconnecting LUN 3):
> 
> Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Sense Key : Illegal
> Request [current]
> Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Add. Sense: Logical unit
> not supported
> Nov  7 07:39:29 s01c kernel: Buffer I/O error on device sde, logical
> block 16542656
> 
> One solution to this is using the portblock RA to block all initiator


In addition I force use of multipath on initiators with no_path_retry=queue

...

> 
> 1. Lack of support for multiple targets using the same tgt account. This
> is a problem because the iSCSITarget RA defines the user and the target
> at the same time. If it allowed multiple targets to use the same user,
> it wouldn't know when it is safe to delete the user in a stop operation,
> because some other target might still be using it.
> 
> To solve this i did two things: first i wrote a new RA that manages a
> tgt user; this is instantiated as a clone so it runs along with the tgtd
> clone. Second i tweaked the iSCSITarget RA so that on start, if
> incoming_username is defined but incoming_password is not, the RA skips
> the account creation step and simply binds the new target to
> incoming_username. On stop, it similarly no longer deletes the account
> if incoming_password is unset. I also had to relax the uniqueness
> constraint on incoming_username in the RA metadata.
> 
> 2. Disappearing LUNs during failover cause initiators to blow chunks.
> For this i used portblock, but had to modify it because the TCP Send-Q
> would never drain.
> 
> 3. portblock preventing TCP Send-Q from draining, causing tgtd
> connections to hang. I modified portblock to reverse the sense of the
> iptables rules it was adding: instead of blocking traffic from the
> initiator on the INPUT chain, it now blocks traffic from the target on
> the OUTPUT chain with a tcp-reset response. With this setup, as soon as
> portblock goes active, the next packet tgtd attempts to send to a given
> initiator will get a TCP RST response, causing tgtd to hang up the
> connection immediately. This configuration allows the connections to
> terminate promptly under load.
> 
> I'm not totally satisfied with this workaround. It means
> acknowledgements of operations tgtd has actually completed never make it
> back to the initiator. I suspect this could cause problems in some
> scenarios. I don't think it causes a problem the way i'm using it, with
> each LUN as backing store for a distinct VM--when the LUN is back up on
> the other node, the outstanding operations are re-sent by the initiator.
> Maybe with a clustered filesystem this would cause problems; it
> certainly would cause problems if the target device were, for example, a
> tape drive.
> 
> 4. "Insufficient privileges" faults in the portblock RA. This was
> another race condition that occurred because i was using multiple
> targets, meaning that without a mutex, multiple portblock invocations
> would be running in parallel during a failover. If you try to run
> iptables while another iptables is running, you get "Resource not
> available" and this was coming back to pacemaker as "insufficient
> privileges". This is simply a bug in the portblock RA; it should have a
> mutex to prevent parallel iptables invocations. I fixed this by adding
> an ocf_release_lock_on_exit at the top, and adding an ocf_take_lock for
> start, stop, monitor, and status operations.
> 
> I'm not sure why more people haven't run into these problems before. I
> hope it's not that i'm doing things wrong, but rather that few others
> haven't earnestly tried to build anything quite like this setup. If
> anyone out there has set up a similar cluster and *not* had these
> problems, i'd like to know about it. Meanwhile, if others *have* had
> these problems, i'd also like to know, especially if they've found
> alternate solutions.

Can't say about 1, I use IET, it doesn't seem to have that limitation.
2 - I use alternative home-brew ms RA which blocks (DROP) both input and
output for a specified VIP on demote (targets are configured to be bound
to that VIPs). I also export one big LUN per target and then set up clvm
VG on top of it (all initiators are in the same another cluster).
3 - can't say as well, IET is probably not affected.
4 - That is true, iptables doesn't have atomic rules management, so you
definitely need mutex or dispatcher like firewalld (didn't try it though).

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] drbd/pacemaker multiple tgt targets, portblock, and race conditions (long-ish)

Reply via email to