13.11.2013 04:46, Jefferson Ogata wrote: ... > > In practice i ran into failover problems under load almost immediately. > Under load, when i would initiate a failover, there was a race > condition: the iSCSILogicalUnit RA will take down the LUNs one at a > time, waiting for each connection to terminate, and if the initiators > reconnect quickly enough, they get pissed off at finding that the target > still exists but the LUN they were using no longer does, which is often > the case during this transient takedown process. On the initiator, it > looks something like this, and it's fatal (here LUN 4 has gone away but > the target is still alive, maybe working on disconnecting LUN 3): > > Nov 7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Sense Key : Illegal > Request [current] > Nov 7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Add. Sense: Logical unit > not supported > Nov 7 07:39:29 s01c kernel: Buffer I/O error on device sde, logical > block 16542656 > > One solution to this is using the portblock RA to block all initiator
In addition I force use of multipath on initiators with no_path_retry=queue ... > > 1. Lack of support for multiple targets using the same tgt account. This > is a problem because the iSCSITarget RA defines the user and the target > at the same time. If it allowed multiple targets to use the same user, > it wouldn't know when it is safe to delete the user in a stop operation, > because some other target might still be using it. > > To solve this i did two things: first i wrote a new RA that manages a > tgt user; this is instantiated as a clone so it runs along with the tgtd > clone. Second i tweaked the iSCSITarget RA so that on start, if > incoming_username is defined but incoming_password is not, the RA skips > the account creation step and simply binds the new target to > incoming_username. On stop, it similarly no longer deletes the account > if incoming_password is unset. I also had to relax the uniqueness > constraint on incoming_username in the RA metadata. > > 2. Disappearing LUNs during failover cause initiators to blow chunks. > For this i used portblock, but had to modify it because the TCP Send-Q > would never drain. > > 3. portblock preventing TCP Send-Q from draining, causing tgtd > connections to hang. I modified portblock to reverse the sense of the > iptables rules it was adding: instead of blocking traffic from the > initiator on the INPUT chain, it now blocks traffic from the target on > the OUTPUT chain with a tcp-reset response. With this setup, as soon as > portblock goes active, the next packet tgtd attempts to send to a given > initiator will get a TCP RST response, causing tgtd to hang up the > connection immediately. This configuration allows the connections to > terminate promptly under load. > > I'm not totally satisfied with this workaround. It means > acknowledgements of operations tgtd has actually completed never make it > back to the initiator. I suspect this could cause problems in some > scenarios. I don't think it causes a problem the way i'm using it, with > each LUN as backing store for a distinct VM--when the LUN is back up on > the other node, the outstanding operations are re-sent by the initiator. > Maybe with a clustered filesystem this would cause problems; it > certainly would cause problems if the target device were, for example, a > tape drive. > > 4. "Insufficient privileges" faults in the portblock RA. This was > another race condition that occurred because i was using multiple > targets, meaning that without a mutex, multiple portblock invocations > would be running in parallel during a failover. If you try to run > iptables while another iptables is running, you get "Resource not > available" and this was coming back to pacemaker as "insufficient > privileges". This is simply a bug in the portblock RA; it should have a > mutex to prevent parallel iptables invocations. I fixed this by adding > an ocf_release_lock_on_exit at the top, and adding an ocf_take_lock for > start, stop, monitor, and status operations. > > I'm not sure why more people haven't run into these problems before. I > hope it's not that i'm doing things wrong, but rather that few others > haven't earnestly tried to build anything quite like this setup. If > anyone out there has set up a similar cluster and *not* had these > problems, i'd like to know about it. Meanwhile, if others *have* had > these problems, i'd also like to know, especially if they've found > alternate solutions. Can't say about 1, I use IET, it doesn't seem to have that limitation. 2 - I use alternative home-brew ms RA which blocks (DROP) both input and output for a specified VIP on demote (targets are configured to be bound to that VIPs). I also export one big LUN per target and then set up clvm VG on top of it (all initiators are in the same another cluster). 3 - can't say as well, IET is probably not affected. 4 - That is true, iptables doesn't have atomic rules management, so you definitely need mutex or dispatcher like firewalld (didn't try it though). _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
