for my experience multipathd+ZFS works well, and it worked well usually. I just remove the broken disk when it happens, replace it and the new multipathd device is added once the disk is replaced, and then then I start resilvering. Anyway I found out this not always works with some version of JBOD disk array/firmware. Some Proware controller that I had did not recognize that a disk was replaced. But This is not a multipathd problem in my case. So my hint is to try it out with your hardware and see how it behaves.
On 26/04/2019 16:57, Kurt Strosahl wrote: > > Hey, thanks! > > > I tried the multipathing part you had down there and I couldn't get it > to work... I did find that this worked though > > > #I pick a victim device > multipath -ll > ... > mpathax (35000cca2680a8194) dm-49 HGST ,HUH721010AL5200 > size=9.1T features='0' hwhandler='0' wp=rw > `-+- policy='service-time 0' prio=1 status=enabled > |- 1:0:10:0 sdj 8:144 active ready running > `- 11:0:9:0 sddy 128:0 active ready running > #then I remove the device > multipath -f mpathax > #and verify that it is gone > multipath -ll | grep mpathax > #then I run the following, which seems to rescan for devices. > multipath -v2 > Apr 26 10:49:06 | sdj: No SAS end device for 'end_device-1:1' > Apr 26 10:49:06 | sddy: No SAS end device for 'end_device-11:1' > create: mpathax (35000cca2680a8194) undef HGST ,HUH721010AL5200 > size=9.1T features='0' hwhandler='0' wp=undef > `-+- policy='service-time 0' prio=1 status=undef > |- 1:0:10:0 sdj 8:144 undef ready running > `- 11:0:9:0 sddy 128:0 undef ready running > #then its back > multipath -ll mpathax > mpathax (35000cca2680a8194) dm-49 HGST ,HUH721010AL5200 > size=9.1T features='0' hwhandler='0' wp=rw > `-+- policy='service-time 0' prio=1 status=enabled > |- 1:0:10:0 sdj 8:144 active ready running > `- 11:0:9:0 sddy 128:0 active ready running > > I still need to test it fully once I get the whole stack up and > running, but this seems to be a step in the right direction. > > > w/r, > Kurt > > ------------------------------------------------------------------------ > *From:* Jongwoo Han <[email protected]> > *Sent:* Friday, April 26, 2019 6:28 AM > *To:* Kurt Strosahl > *Cc:* [email protected] > *Subject:* Re: [lustre-discuss] ZFS and multipathing for OSTs > > Disk replacement with multipathd + zfs is somewhat not convenient. > > step1: mark offline the disk you should replace with zpool command > step2: remove disk from multipathd table with multipath -f <mpath id> > step3: replace disk > step4: add disk to multipath table with multipath -ll <mpath id> > step5: replace disk in zpool with zpool replace > > try this in your test environment and tell us if you have found > anything interesting in the syslog. > In my case replacing single disk in multipathd+zfs pool triggerd > massive udevd partition scan. > > Thanks > Jongwoo Han > > 2019년 4월 26일 (금) 오전 3:44, Kurt Strosahl <[email protected] > <mailto:[email protected]>>님이 작성: > > Good Afternoon, > > > As part of a new lustre deployment I've now got two disk > shelves connected redundantly to two servers. Since each disk has > two paths to the server I'd like to use multipathing for both > redundancy and improved performance. I haven't found examples or > discussion about such a setup, and was wondering if there are any > resources out there that I could consult. > > > Of particular interest would be examples of the > /etc/zfs/vdev_id.conf and any tuning that was done. I'm also > wondering about extra steps that may have to be taken when doing a > disk replacement to account for the multipathing. I've got plenty > of time to experiment with this process, but I'd rather not > reinvent the wheel if I don't have to. > > > w/r, > > Kurt J. Strosahl > System Administrator: Lustre, HPC > Scientific Computing Group, Thomas Jefferson National Accelerator > Facility > > _______________________________________________ > lustre-discuss mailing list > [email protected] > <mailto:[email protected]> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > <https://gcc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=02%7C01%7Cstrosahl%40jlab.org%7Cba16f1aff6144708f17708d6ca31e3ee%7Cb4d7ee1f4fb34f0690372b5b522042ab%7C1%7C1%7C636918712958511376&sdata=p6QC1JIfSnyq8IC1SgOJWlWdcD2Drs9vbtrutuynGEs%3D&reserved=0> > > > > -- > Jongwoo Han > +82-505-227-6108 > > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
