Persistent reservation behaviour/compliance with redundant controllers
Hi all, I'm experiencing a behaviour that doesn't comply to the SPC3/4 standards from my point of view. I have read the t10 drafts to understand scsi3 persistent reservations (PR). Probably I simply got the standard wrong, but maybe somebody can bring light into the situation. My understanding of SPC-3/4 is that with PR, registrations should happen on any I_T Nexus accessing a volume. To me, in a dm-multipath environment, this translates to "register every single path". But that doesn't work on our 3Par 7400. Now the question is, who is wrong? Me (likely :-), or HP/3Par (unlikely). Here's the dmmp map 360002aca6e6b dm-6 3PARdata,VV size=2.0T features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 3:0:1:4 sdg 8:96active ready running |- 3:0:3:4 sdl 8:176 active ready running |- 5:0:3:4 sdbg 67:160 active ready running `- 5:0:1:4 sdce 69:32 active ready running Here are the commands: 1: starting with a clean state: # sg_persist --in --read-keys /dev/sdg 3PARdata VV3122 Peripheral device type: disk PR generation=0x3a, there are NO registered reservation keys 2: first registration (sdg) works fine: # sg_persist -d /dev/sdg --no-inquiry --out --register \ --param-sark=0x420480a02967 3: however registering sdl fails: # sg_persist -d /dev/sdl --no-inquiry --out --register \ --param-sark=0x420480a0296c persistent reserve out: scsi status: Reservation Conflict When I --register-*ignore* the second device, the command succeeds. But the first registration key for sdg gets substituted by the new one for sdl. The same thing happens the other way around when sdg is register-ignore'd again. There can only be two registrations at a time: (sdg XOR sdl) and (sdbg XOR sdce) Now my question is: Does this comply to the standard? My core problem is that I'd like to ensure that no registration is missing by accident. I hope that somebody on this list is kind enough to answer my question or give me a hint. HP was not able to direct it to a capable person in the last 9 months. *sigh* Any help is appreciated! Thanks in advance, Matthias 3Par specific information: 3Par systems have a transparent controller(node) failover feature. In the example above, scsi host3 has two paths to the same volume. The paths are provided by two different controller nodes. If one node fails, the other node can take over the path transparently. To me it looks like the SG3PR implementation is too transparent when it comes to SG3PR. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Persistent reservation behaviour/compliance with redundant controllers
2014/1/6 Lee Duncan : > On 12/25/2013 03:00 PM, Matthias Eble wrote: >> Here's the dmmp map >> 360002aca6e6b dm-6 3PARdata,VV >> size=2.0T features='0' hwhandler='0' wp=rw >> `-+- policy='round-robin 0' prio=1 status=active >> |- 3:0:1:4 sdg 8:96active ready running >> |- 3:0:3:4 sdl 8:176 active ready running >> |- 5:0:3:4 sdbg 67:160 active ready running >> `- 5:0:1:4 sdce 69:32 active ready running >> >> There can only be two registrations at a time: (sdg XOR sdl) and (sdbg XOR >> sdce) >> Now my question is: Does this comply to the standard? >> > > I _believe_ the problem is that you are re-registering the same > I_T_Nexus through /dev/sdl, your second attempt at registration, as you > did when you used /dev/sdg, your original registration. Can sdg and sdl be the same I_T_Nexus at a time? Right now, they are handled like that. In my understanding, every scsi disk device represents an I_T_Nexus. # lsscsi -t | egrep '/dev/sd(g|l|bg|ce)' [3:0:1:4]diskfc:0x20120002ac006e6b,0x14ad40 /dev/sdg [3:0:3:4]diskfc:0x21120002ac006e6b,0x14ad80 /dev/sdl [5:0:1:4]diskfc:0x22110002ac006e6b,0x0aad40 /dev/sdce [5:0:3:4]diskfc:0x23110002ac006e6b,0x0aad80 /dev/sdbg > What are you really trying to do? Are you testing that persistent > reservations "work" or trying to figure them out? I am testing PR on a specific storage system, which seems to behave differently like the ones before. > I have a "persistent reservations for dummies" document I wrote that I > can send you off list, if you like. I think I know how PRs work. Yet I'd be happy about your document. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Persistent reservation behaviour/compliance with redundant controllers
2014/1/7 James Bottomley : > On Mon, 2014-01-06 at 23:53 +0100, Matthias Eble wrote: >> >> Can sdg and sdl be the same I_T_Nexus at a time? >> Right now, they are handled like that. >> In my understanding, every scsi disk device represents an I_T_Nexus. > > No, every SCSI disk is an I_T_L nexus. There's no actual device object > in Linux for an I_T nexus. So, PR registrations are made for an I_T nexus using an I_T_L nexus. Probably my previous systems had a 1:1 relation between I_T and I_T_L. Is there a way to identify which I_T_L nexuses belong to the same I_T nexus? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Persistent reservation behaviour/compliance with redundant controllers
2014/1/7 James Bottomley > > On Mon, 2014-01-06 at 23:53 +0100, Matthias Eble wrote: > > 2014/1/6 Lee Duncan : > > > On 12/25/2013 03:00 PM, Matthias Eble wrote: > > >> Here's the dmmp map > > >> 360002aca6e6b dm-6 3PARdata,VV > > >> size=2.0T features='0' hwhandler='0' wp=rw > > >> `-+- policy='round-robin 0' prio=1 status=active > > >> |- 3:0:1:4 sdg 8:96active ready running > > >> |- 3:0:3:4 sdl 8:176 active ready running > > >> |- 5:0:3:4 sdbg 67:160 active ready running > > >> `- 5:0:1:4 sdce 69:32 active ready running > > >> > > >> There can only be two registrations at a time: (sdg XOR sdl) and (sdbg > > >> XOR sdce) > > >> Now my question is: Does this comply to the standard? > > >> > > > > > > I _believe_ the problem is that you are re-registering the same > > > I_T_Nexus through /dev/sdl, your second attempt at registration, as you > > > did when you used /dev/sdg, your original registration. > > > > > > Can sdg and sdl be the same I_T_Nexus at a time? > > Right now, they are handled like that. > > In my understanding, every scsi disk device represents an I_T_Nexus. > > No, every SCSI disk is an I_T_L nexus. There's no actual device object > in Linux for an I_T nexus. Hi All, I'd like to document the progress and findings in lots of off-list emails with HP's t10 members. Maybe someone on the net will face the same problem. First of all, the SPC wording isn't 100% precise. For most commands, the Lun context is implicit. So if the standards state "I_T Nexus", I_T_L Nexuses are meant, as the reservation commands are always lun specific. That said, PR-registrations need to be done for every I_T_L Nexus -> every single dmmp path (/dev/sdX) So we started to test the behaviour of the 3Par system. It seems that there are some quirks in the 3Par implementation. The error that led to my initial question is that the target port identifier isn't included in the target's reservation handling. Thus all PR commands from one host port are considered the same. Regardless of the target port over which they were received. (As seen in attached commands #5 or #6 after issuing #2 ) Note that the investigations haven't been finished. For those who are interested, here are the findings (verbose output stripped): 1.# sg_persist --in --read-keys /dev/sdl 3PARdata VV3122 Peripheral device type: disk PR generation=0x44, there are NO registered reservation keys register via sdl: 2.# sg_persist -vvv -d /dev/sdl --no-inquiry --out --register --param-sark=0x420480a0296c PR out: command (Register) successful test for scp3r23 table 33 compliance (same key on registered I_T Nexus should succeed): False 3.# sg_persist -vvv -d /dev/sdl --no-inquiry --out --register --param-sark=0x420480a0296c persistent reserve out: scsi status: Reservation Conflict PR out: command failed now with a *different key* (should conflict): True 4.# sg_persist -vvv -d /dev/sdl --no-inquiry --out --register --param-sark=0x420480a0296d persistent reserve out: scsi status: Reservation Conflict PR out: command failed Same behaviour using another path/I_T_L Nexus (should succeed in both cases): 5.# sg_persist -vvv -d /dev/sdg --no-inquiry --out --register --param-sark=0x420480a0296c persistent reserve out: scsi status: Reservation Conflict PR out: command failed 6.# sg_persist -vvv -d /dev/sdg --no-inquiry --out --register --param-sark=0x420480a0296d persistent reserve out: scsi status: Reservation Conflict PR out: command failed Unregister via sdg :-/ 7.# sg_persist -vvv -d /dev/sdg --no-inquiry --out --register --param-rk=0x420480a0296c PR out: command (Register) successful Additionally, read-full-status service action and ALL_TG_PT are not supported, right now. That's it for now. Thanks for your replies, Matthias -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Open/INQUIRY fails on RESERVE'd tape device
Hi list, When a tape device is reserved with old reserve/release commands, we see inquiry only works on the scsi generic device. For scsi tape devices open() fails already: # lsscsi -g | grep st15 [2:0:6:0]tapeHP Ultrium 5-SCSI I5DZ /dev/st15 /dev/sg17 # sg_vpd -vvv /dev/st15 open /dev/st15 with flags=0x800 error opening file: /dev/st15: Input/output error # sg_vpd -vvv /dev/nst15 open /dev/nst15 with flags=0x800 error opening file: /dev/nst15: Input/output error # sg_vpd -vvv /dev/sg17 open /dev/sg17 with flags=0x800 Supported VPD pages VPD page: inquiry cdb: 12 01 00 00 fc 00 duration=2 ms inquiry: requested 252 bytes but got 22 bytes [PQual=0 Peripheral device type: tape] Supported VPD pages [sv] Unit serial number [sn] ... So: should open() fail on a reserved tape device? SPC2 states that INQUIRY should never conflict. Or does that only apply to the generic device? Okay, it doesn't conflict, but open fails. A SunOS st man page I found states, INQUIRY shall be possible with reserved devices. Of course the inquiry succeeds, after the reservation is being released. Thanks in advance Matthias -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Open/INQUIRY fails on RESERVE'd tape device
Hi all, 2014/1/24 Jeremy Linton : > On 1/23/2014 4:02 PM, Matthias Eble wrote: >> So: should open() fail on a reserved tape device? > > Yes, this is expected behavior for tape devices, reserve 6/release is > sometimes > used by backup applications in SAN environments as an arbitration mechanism > across multiple machines. You hit the nail on the head. Problem is that our backup application does inquiry on /dev/nst*, which is broken when the same application uses RESERVE/RELEASE. > Its not that the INQUIRY is failing, its that the st open sequence is > doing a > reserve/TUR/etc during the open. This is exactly what I am facing. I just thought that it might not be OK to issue these commands with st_open. But I guess, there is no right or wrong it's just implemented that way - so applications need to deal with it and use a generic device. > If that fails then you can't open the drives sufficiently to send a > inquiry via > pass-through. In some environments you can bypass that processing with > O_NDELAY/O_NONBLOCK. Or you just use the sg device which doesn't perform the > tape open processing that st does. I guess you mean operating systems with environments, as sg_vpd also uses O_NONBLOCK, which doesn't help: open("/dev/st15", O_RDONLY|O_NONBLOCK) = -1 EIO (Input/output error) But as this behaviour has been there for long time, the backup vendor needs to fix it IMO. Thanks to all of you Matthias -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html