On Wed, Mar 14, 2018 at 12:05 PM, Michael Christie <mchri...@redhat.com>

> On 03/14/2018 01:27 PM, Michael Christie wrote:
> > On 03/14/2018 01:24 PM, Maxim Patlasov wrote:
> >> On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdill...@redhat.com
> >> <mailto:jdill...@redhat.com>> wrote:
> >>
> >>     Maxim, can you provide steps for a reproducer?
> >>
> >>
> >> Yes, but it involves adding two artificial delays: one in tcmu-runner
> >> and another in kernel iscsi. If you're willing to take pains of
> >
> > Send the patches for the changes.
> >
> >> recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to
> >> reproduce.
> >>
> >> Generally, the idea of reproducer is simple: let's model a situation
> >> when two stale requests got stuck in kernel mailbox waiting to be
> >> consumed by tcmu-runner, and another one got stuck in iscsi layer --
> >> immediately after reading iscsi request from the socket. If we unblock
> >> tcmu-runner after newer data went through another gateway, the first
> >> stale request will switch tcmu-runner state from LOCKED to UNLOCKED>>
> state, then the second stale request will trigger alua_thread to
> >> re-acquire the lock, so when the third request comes to tcmu-runner, the
> Where you send the patches that add your delays could you send the
> target side /var/log/tcmu-runner.log with log_level = 4.
> For this test above you should see the second request will be sent to
> rbd's tcmu_rbd_aio_write function. That command should fail in
> rbd_finish_aio_generic and tcmu_rbd_handle_blacklisted_cmd will be
> called. We should then be blocking until IO in that iscsi connection is
> flushed in tgt_port_grp_recovery_thread_fn. That function will not
> return from the enable=0 until the iscsi connection is stopped and the
> commands in it have completed.
> Other commands you had in flight should eventually hit
> tcmur_cmd_handler's tcmu_dev_in_recovery check and be failed there or if
> they had already passed that check then the cmd would be sent to
> tcmu_rbd_aio_write and they should be getting the blacklisted error like
> above.

In my scenario the second request is not sent to rbd's tcmu_rbd_aio_write

tcmur_cmd_handler -->
  tcmur_alua_implicit_transition -->
    alua_implicit_transition --> // rdev->lock_state == UNLOCKED here
      tcmu_set_sense_data // returns SAM_STAT_CHECK_CONDITION

Hence tcmur_cmd_handler goes to "untrack:". I'll send
/var/log/tcmu-runner.log and delay patches an hour later.

ceph-users mailing list

Reply via email to