On Wed, Mar 14, 2018 at 12:05 PM, Michael Christie <mchri...@redhat.com> wrote:
> On 03/14/2018 01:27 PM, Michael Christie wrote: > > On 03/14/2018 01:24 PM, Maxim Patlasov wrote: > >> On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdill...@redhat.com > >> <mailto:jdill...@redhat.com>> wrote: > >> > >> Maxim, can you provide steps for a reproducer? > >> > >> > >> Yes, but it involves adding two artificial delays: one in tcmu-runner > >> and another in kernel iscsi. If you're willing to take pains of > > > > Send the patches for the changes. > > > >> recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to > >> reproduce. > >> > >> Generally, the idea of reproducer is simple: let's model a situation > >> when two stale requests got stuck in kernel mailbox waiting to be > >> consumed by tcmu-runner, and another one got stuck in iscsi layer -- > >> immediately after reading iscsi request from the socket. If we unblock > >> tcmu-runner after newer data went through another gateway, the first > >> stale request will switch tcmu-runner state from LOCKED to UNLOCKED>> > state, then the second stale request will trigger alua_thread to > >> re-acquire the lock, so when the third request comes to tcmu-runner, the > Where you send the patches that add your delays could you send the > target side /var/log/tcmu-runner.log with log_level = 4. > > For this test above you should see the second request will be sent to > rbd's tcmu_rbd_aio_write function. That command should fail in > rbd_finish_aio_generic and tcmu_rbd_handle_blacklisted_cmd will be > called. We should then be blocking until IO in that iscsi connection is > flushed in tgt_port_grp_recovery_thread_fn. That function will not > return from the enable=0 until the iscsi connection is stopped and the > commands in it have completed. > > Other commands you had in flight should eventually hit > tcmur_cmd_handler's tcmu_dev_in_recovery check and be failed there or if > they had already passed that check then the cmd would be sent to > tcmu_rbd_aio_write and they should be getting the blacklisted error like > above. > > Mike, In my scenario the second request is not sent to rbd's tcmu_rbd_aio_write function: tcmur_cmd_handler --> tcmur_alua_implicit_transition --> alua_implicit_transition --> // rdev->lock_state == UNLOCKED here tcmu_set_sense_data // returns SAM_STAT_CHECK_CONDITION Hence tcmur_cmd_handler goes to "untrack:". I'll send /var/log/tcmu-runner.log and delay patches an hour later. Thanks, Maxim
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com