Hi John, Thanks for the log. As I looked into the log, I think it has been broken by the commit:
``` commit 5a6ceb664f07812c351786c1043da71ff5027f8c Author: Alex Zhuravlev <[email protected]> Date: Mon Sep 28 16:50:15 2015 +0300 LU-7236 ptlrpc: idle connections can disconnect ``` In particular, this following change introduced the problem: ``` - } else if (req->rq_no_delay) { + } else if (req->rq_no_delay && + imp->imp_generation != imp->imp_initiated_at) { + /* ignore nodelay for requests initiating connections */ *status = -EWOULDBLOCK; ``` where it makes the RPC request to be delayed even `rq_no_delay` is set. Jinshan On Fri, May 24, 2019 at 6:29 AM John Doe <[email protected]> wrote: > I have sent the log file to you in a separate email. > > Note - I read four 1MB blocks, the first two 1MB blocks were cached. > > On Fri, May 24, 2019 at 1:01 AM Jinshan Xiong <[email protected]> > wrote: > >> hmm.. This definitely is not expected. As long as ost 1 is down, it >> should be returned immediately from OSC layer and tries to read the 2nd >> mirror that is located on ost 7. For the following blocks, it should not >> even try ost1 but go to 7 directly. >> >> Would you please collect Lustre log and send it to me? You can collect >> logs on client side as follows: >> 0. create mirrored file >> 1. lctl set_param debug=-1 && lctl clear >> 2. lctl mark "======= start ========" >> 3. read the file >> 4. lctl dk > log.txt >> >> and send me the log.txt file. If you can reproduce this problem >> consistently, please use a small file so that it would be easier to check >> the log. >> >> Jinshan >> >> On Mon, May 20, 2019 at 6:20 AM John Doe <[email protected]> wrote: >> >>> It turns out that the read eventually finished and was 1/10th of the >>> performance that I was expecting. >>> >>> As ost idx 1 is unavailable, the client read has to timeout on ost idx 1 >>> and then will read from ost idx 7. This happens for each 1MB block, as I am >>> using that as the block size. >>> >>> Is there a tunable to avoid this issue? >>> >>> lfs check osts also takes about 30 seconds as it times out on the >>> unavailable OST. >>> >>> Due to this issue, I am virtually unable to use the mirroring feature. >>> >>> I >>> >>> On Sun, May 19, 2019 at 4:27 PM John Doe <[email protected]> wrote: >>> >>>> After mirroring a file , when one mirror is down, any reads from a >>>> client just hangs. Both server and client are running latest 2.12.1-1. >>>> Client waits for ost idx 1 to come back online. I am only unmounting ost >>>> idx1 not ost idx 7. >>>> >>>> Has anyone tried this feature? >>>> >>>> Thanks, >>>> John. >>>> >>>> lfs getstripe mirror10 >>>> mirror10 >>>> lcm_layout_gen: 5 >>>> lcm_mirror_count: 2 >>>> lcm_entry_count: 2 >>>> lcme_id: 65537 >>>> lcme_mirror_id: 1 >>>> lcme_flags: init >>>> lcme_extent.e_start: 0 >>>> lcme_extent.e_end: EOF >>>> lmm_stripe_count: 1 >>>> lmm_stripe_size: 1048576 >>>> lmm_pattern: raid0 >>>> lmm_layout_gen: 0 >>>> lmm_stripe_offset: 1 >>>> lmm_pool: 01 >>>> lmm_objects: >>>> - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x280a8:0x0] } >>>> >>>> lcme_id: 131074 >>>> lcme_mirror_id: 2 >>>> lcme_flags: init >>>> lcme_extent.e_start: 0 >>>> lcme_extent.e_end: EOF >>>> lmm_stripe_count: 1 >>>> lmm_stripe_size: 1048576 >>>> lmm_pattern: raid0 >>>> lmm_layout_gen: 0 >>>> lmm_stripe_offset: 7 >>>> lmm_pool: 02 >>>> lmm_objects: >>>> - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x28066:0x0] } >>>> >>> _______________________________________________ >>> lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>> >>
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
