The following reply was made to PR kern/154299; it has been noted by GNATS.
From: Joshua Sirrine <jsirr...@gmail.com> To: bug-follo...@freebsd.org, rincebr...@gmail.com Cc: Subject: Re: kern/154299: [arcmsr] arcmsr fails to detect all attached drives Date: Wed, 23 Jan 2013 19:59:23 -0600 This is a multi-part message in MIME format. --------------090005070002040708070202 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit First I'd like to appologize right now if I am sending this email and it is not being routed correctly. This is not the same as the ticket system FreeNAS uses so I'm in new territory. I've been using FreeNAS(FreeBSD) for about a year but I am a quick learner. If I need to provide this information in a form other than email to fix this issue please let me know. I believe I have found the cause for disks not being usable as seen on kern/154299 <http://www.freebsd.org/cgi/query-pr.cgi?pr=154299>. Here's what I see on my system. My system uses an Areca 1280ML-24 with Firmware 1.49(latest) and uses FreeNAS 8.3.0 x64(based on FreeBSD 8.3) with areca-cli version Version 1.84, Arclib: 300, Date: Nov 9 2010( FreeBSD ). I found this issue when swapping out backplanes for my hard drives. I had drives populating RAID controller ports 1 through 14. Due to a failed backplane I switched the 2 drives that were connected to ports 13 and 14 to 21 and 22 respectively. All of these disks are in a ZFS RAIDZ3 zpool. Note that I have not had any problems with ZFS scrubs or SMART long tests on these drives and they have been running for more than a year so infant mortality is not an issue. Also the RAID controller is in Non-RAID mode so all disks are JBOD by default. Physical Drive Information # Ch# ModelName Capacity Usage =============================================================================== 1 1 WDC WD20EARS-00S8B1 2000.4GB JBOD 2 2 WDC WD20EARS-00S8B1 2000.4GB JBOD 3 3 WDC WD20EARS-00S8B1 2000.4GB JBOD 4 4 WDC WD20EARS-00S8B1 2000.4GB JBOD 5 5 WDC WD20EARS-00S8B1 2000.4GB JBOD 6 6 WDC WD20EARS-00S8B1 2000.4GB JBOD 7 7 WDC WD20EARS-00S8B1 2000.4GB JBOD 8 8 WDC WD20EARS-00S8B1 2000.4GB JBOD 9 9 WDC WD20EARS-00S8B1 2000.4GB JBOD 10 10 WDC WD20EARS-00S8B1 2000.4GB JBOD 11 11 WDC WD20EARS-00S8B1 2000.4GB JBOD 12 12 WDC WD20EARS-00S8B1 2000.4GB JBOD 13 13 N.A. 0.0GB N.A. 14 14 N.A. 0.0GB N.A. 15 15 N.A. 0.0GB N.A. 16 16 N.A. 0.0GB N.A. 17 17 N.A. 0.0GB N.A. 18 18 N.A. 0.0GB N.A. 19 19 N.A. 0.0GB N.A. 20 20 N.A. 0.0GB N.A. 21 21 WDC WD20EARS-00S8B1 2000.4GB JBOD 22 22 WDC WD20EARS-00S8B1 2000.4GB JBOD 23 23 N.A. 0.0GB N.A. 24 24 N.A. 0.0GB N.A. =============================================================================== With this configuration disks 21 and 22 were not available to me(only 12 of the disks were available). I was using a ZFS RAIDZ3 for all of these disks so I immediately lost 2 disks worth of redundancy. The disks showed up in the RAID controller BIOS as well as the areca-cli(as you can see) but /dev was minus 2 disks and a 'zpool status' showed I had 2 missing drives. As soon as I swapped cables so that the disks were back in ports 13 and 14 on the RAID controller everything went back to normal. Knowing that something was wrong I grabbed some spare drives and started experimenting. I wanted to know what was actually wrong because I am trusting this sytem with my data for production use. Please examine the following VolumeSet Information: VolumeSet Information # Name Raid Name Level Capacity Ch/Id/Lun State =============================================================================== 1 WD20EARS-00S8B1 Raid Set # 00 JBOD 2000.4GB 00/00/00 Normal 2 WD20EARS-00S8B1 Raid Set # 01 JBOD 2000.4GB 00/00/01 Normal 3 WD20EARS-00S8B1 Raid Set # 02 JBOD 2000.4GB 00/00/02 Normal 4 WD20EARS-00S8B1 Raid Set # 03 JBOD 2000.4GB 00/00/03 Normal 5 WD20EARS-00S8B1 Raid Set # 04 JBOD 2000.4GB 00/00/04 Normal 6 WD20EARS-00S8B1 Raid Set # 05 JBOD 2000.4GB 00/00/05 Normal 7 WD20EARS-00S8B1 Raid Set # 06 JBOD 2000.4GB 00/00/06 Normal 8 WD20EARS-00S8B1 Raid Set # 07 JBOD 2000.4GB 00/00/07 Normal 9 WD20EARS-00S8B1 Raid Set # 08 JBOD 2000.4GB 00/01/00 Normal 10 WD20EARS-00S8B1 Raid Set # 09 JBOD 2000.4GB 00/01/01 Normal 11 WD20EARS-00S8B1 Raid Set # 10 JBOD 2000.4GB 00/01/02 Normal 12 WD20EARS-00S8B1 Raid Set # 11 JBOD 2000.4GB 00/01/03 Normal 13 WD20EARS-00S8B1 Raid Set # 12 JBOD 2000.4GB 00/01/04 Normal 14 WD20EARS-00S8B1 Raid Set # 13 JBOD 2000.4GB 00/01/05 Normal =============================================================================== GuiErrMsg<0x00>: Success. This is my normal configuration and all disks work. After experimenting it turns out that if I want to use ports 1 through 8 I MUST have a disk in port 1. For ports 9 through 16 I MUST have a disk in port 9. For ports in 17-24 I MUST have a disk in port 17. It appears there may be something special to CH/ID/LUN=XX/XX/00. If there is no disk at LUN=00 then that entire ID is not available for use by FreeBSD despite the areca-cli properly identifying the disk. If you look at the kern/143299: *arcmsr fails to detect all attached drives. It may or may not have something to do with a failed device attached and e.g. PR 148502 or 150390.* *c.f.:* *[root@manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;* *# Ch# ModelName Capacity Usage* *===============================================================================* *1 1 N.A. 0.0GB N.A.* *2 2 N.A. 0.0GB N.A.* *3 3 N.A. 0.0GB N.A.* *4 4 N.A. 0.0GB N.A.* *5 5 N.A. 0.0GB N.A.* *6 6 N.A. 0.0GB N.A.* *7 7 N.A. 0.0GB N.A.* *8 8 N.A. 0.0GB N.A.* *9 9 ST31500341AS 1500.3GB JBOD* *10 10 N.A. 0.0GB N.A.* *11 11 ST31500341AS 1500.3GB JBOD* *12 12 ST31500341AS 1500.3GB JBOD* *13 13 ST31500341AS 1500.3GB JBOD* *14 14 N.A. 0.0GB N.A.* *15 15 ST31500341AS 1500.3GB JBOD* *16 16 ST31500341AS 1500.3GB JBOD* *17 17 N.A. 0.0GB N.A.* *18 18 N.A. 0.0GB N.A.* *19 19 ST31500341AS 1500.3GB JBOD* *20 20 ST31500341AS 1500.3GB JBOD* *21 21 ST31500341AS 1500.3GB JBOD* *22 22 0.0GB Failed* *23 23 ST31500341AS 1500.3GB JBOD* *24 24 ST31500341AS 1500.3GB JBOD* *===============================================================================* *GuiErrMsg<0x00>: Success.* */dev/ad4 /dev/ad4s1 /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d /dev/da0 /dev/da1 /dev/da1p1 /dev/da1p9 /dev/da2 /dev/da3 /dev/da4 /dev/da5* *I count 11 drives attached via the arc1280ml, not including the failed drive, and I see 6 appearing.* *camcontrol rescan all and reboots to do help the issue. I am running firmware 1.49.* If you take what I observed and apply it to his post you will see that only disks 9, 11, 12, 13, 15, and 16 would be available to the system. So this is inline with the poster that says he has only 6 disk available. I am writing this email in hopes that someone can find and fix the issue. I do not have any failed disks to experiment with, but I am convinced based on 4 hours of experimenting last night that the issue may only involve failed disks if a disk fails in ports 1, 9 or 17. --------------090005070002040708070202 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit <html> <head> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> </head> <body text="#000000" bgcolor="#FFFFFF"> First I'd like to appologize right now if I am sending this email and it is not being routed correctly. This is not the same as the ticket system FreeNAS uses so I'm in new territory. I've been using FreeNAS(FreeBSD) for about a year but I am a quick learner. If I need to provide this information in a form other than email to fix this issue please let me know.<br> <br> I believe I have found the cause for disks not being usable as seen on <a href="http://www.freebsd.org/cgi/query-pr.cgi?pr=154299">kern/154299</a>. Here's what I see on my system. My system uses an Areca 1280ML-24 with Firmware 1.49(latest) and uses FreeNAS 8.3.0 x64(based on FreeBSD 8.3) with areca-cli version Version 1.84, Arclib: 300, Date: Nov 9 2010( FreeBSD ). I found this issue when swapping out backplanes for my hard drives.<br> <br> I had drives populating RAID controller ports 1 through 14. Due to a failed backplane I switched the 2 drives that were connected to ports 13 and 14 to 21 and 22 respectively. All of these disks are in a ZFS RAIDZ3 zpool. Note that I have not had any problems with ZFS scrubs or SMART long tests on these drives and they have been running for more than a year so infant mortality is not an issue. Also the RAID controller is in Non-RAID mode so all disks are JBOD by default.<br> <br> Physical Drive Information<br> # Ch# ModelName Capacity Usage<br> ===============================================================================<br> 1 1 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 2 2 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 3 3 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 4 4 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 5 5 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 6 6 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 7 7 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 8 8 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 9 9 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 10 10 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 11 11 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 12 12 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 13 13 N.A. 0.0GB N.A. <br> 14 14 N.A. 0.0GB N.A. <br> 15 15 N.A. 0.0GB N.A. <br> 16 16 N.A. 0.0GB N.A. <br> 17 17 N.A. 0.0GB N.A. <br> 18 18 N.A. 0.0GB N.A. <br> 19 19 N.A. 0.0GB N.A. <br> 20 20 N.A. 0.0GB N.A. <br> 21 21 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 22 22 WDC WD20EARS-00S8B1 2000.4GB JBOD <br> 23 23 N.A. 0.0GB N.A. <br> 24 24 N.A. 0.0GB N.A. <br> ===============================================================================<br> <br> With this configuration disks 21 and 22 were not available to me(only 12 of the disks were available). I was using a ZFS RAIDZ3 for all of these disks so I immediately lost 2 disks worth of redundancy. The disks showed up in the RAID controller BIOS as well as the areca-cli(as you can see) but /dev was minus 2 disks and a 'zpool status' showed I had 2 missing drives. As soon as I swapped cables so that the disks were back in ports 13 and 14 on the RAID controller everything went back to normal.<br> <br> Knowing that something was wrong I grabbed some spare drives and started experimenting. I wanted to know what was actually wrong because I am trusting this sytem with my data for production use. Please examine the following VolumeSet Information:<br> <br> VolumeSet Information<br> # Name Raid Name Level Capacity Ch/Id/Lun State <br> ===============================================================================<br> 1 WD20EARS-00S8B1 Raid Set # 00 JBOD 2000.4GB 00/00/00 Normal<br> 2 WD20EARS-00S8B1 Raid Set # 01 JBOD 2000.4GB 00/00/01 Normal<br> 3 WD20EARS-00S8B1 Raid Set # 02 JBOD 2000.4GB 00/00/02 Normal<br> 4 WD20EARS-00S8B1 Raid Set # 03 JBOD 2000.4GB 00/00/03 Normal<br> 5 WD20EARS-00S8B1 Raid Set # 04 JBOD 2000.4GB 00/00/04 Normal<br> 6 WD20EARS-00S8B1 Raid Set # 05 JBOD 2000.4GB 00/00/05 Normal<br> 7 WD20EARS-00S8B1 Raid Set # 06 JBOD 2000.4GB 00/00/06 Normal<br> 8 WD20EARS-00S8B1 Raid Set # 07 JBOD 2000.4GB 00/00/07 Normal<br> 9 WD20EARS-00S8B1 Raid Set # 08 JBOD 2000.4GB 00/01/00 Normal<br> 10 WD20EARS-00S8B1 Raid Set # 09 JBOD 2000.4GB 00/01/01 Normal<br> 11 WD20EARS-00S8B1 Raid Set # 10 JBOD 2000.4GB 00/01/02 Normal<br> 12 WD20EARS-00S8B1 Raid Set # 11 JBOD 2000.4GB 00/01/03 Normal<br> 13 WD20EARS-00S8B1 Raid Set # 12 JBOD 2000.4GB 00/01/04 Normal<br> 14 WD20EARS-00S8B1 Raid Set # 13 JBOD 2000.4GB 00/01/05 Normal<br> ===============================================================================<br> GuiErrMsg<0x00>: Success.<br> <br> This is my normal configuration and all disks work. After experimenting it turns out that if I want to use ports 1 through 8 I MUST have a disk in port 1. For ports 9 through 16 I MUST have a disk in port 9. For ports in 17-24 I MUST have a disk in port 17. It appears there may be something special to CH/ID/LUN=XX/XX/00. If there is no disk at LUN=00 then that entire ID is not available for use by FreeBSD despite the areca-cli properly identifying the disk.<br> <br> If you look at the kern/143299:<br> <br> <blockquote> <blockquote><b>arcmsr fails to detect all attached drives. It may or may not have something to do with a failed device attached and e.g. PR 148502 or 150390.</b><br> <br> <b>c.f.:</b><br> <br> <b>[root@manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;</b><br> <b> # Ch# ModelName Capacity Usage</b><br> <b>===============================================================================</b><br> <b> 1 1 N.A. 0.0GB N.A.</b><br> <b> 2 2 N.A. 0.0GB N.A.</b><br> <b> 3 3 N.A. 0.0GB N.A.</b><br> <b> 4 4 N.A. 0.0GB N.A.</b><br> <b> 5 5 N.A. 0.0GB N.A.</b><br> <b> 6 6 N.A. 0.0GB N.A.</b><br> <b> 7 7 N.A. 0.0GB N.A.</b><br> <b> 8 8 N.A. 0.0GB N.A.</b><br> <b> 9 9 ST31500341AS 1500.3GB JBOD</b><br> <b> 10 10 N.A. 0.0GB N.A.</b><br> <b> 11 11 ST31500341AS 1500.3GB JBOD</b><br> <b> 12 12 ST31500341AS 1500.3GB JBOD</b><br> <b> 13 13 ST31500341AS 1500.3GB JBOD</b><br> <b> 14 14 N.A. 0.0GB N.A.</b><br> <b> 15 15 ST31500341AS 1500.3GB JBOD</b><br> <b> 16 16 ST31500341AS 1500.3GB JBOD</b><br> <b> 17 17 N.A. 0.0GB N.A.</b><br> <b> 18 18 N.A. 0.0GB N.A.</b><br> <b> 19 19 ST31500341AS 1500.3GB JBOD</b><br> <b> 20 20 ST31500341AS 1500.3GB JBOD</b><br> <b> 21 21 ST31500341AS 1500.3GB JBOD</b><br> <b> 22 22 0.0GB Failed</b><br> <b> 23 23 ST31500341AS 1500.3GB JBOD</b><br> <b> 24 24 ST31500341AS 1500.3GB JBOD</b><br> <b>===============================================================================</b><br> <b>GuiErrMsg<0x00>: Success.</b><br> <b>/dev/ad4 /dev/ad4s1 /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d /dev/da0 /dev/da1 /dev/da1p1 /dev/da1p9 /dev/da2 /dev/da3 /dev/da4 /dev/da5</b><br> <br> <b>I count 11 drives attached via the arc1280ml, not including the failed drive, and I see 6 appearing.</b><br> <br> <b>camcontrol rescan all and reboots to do help the issue. I am running firmware 1.49.</b><br> </blockquote> </blockquote> <br> If you take what I observed and apply it to his post you will see that only disks 9, 11, 12, 13, 15, and 16 would be available to the system. So this is inline with the poster that says he has only 6 disk available. I am writing this email in hopes that someone can find and fix the issue. I do not have any failed disks to experiment with, but I am convinced based on 4 hours of experimenting last night that the issue may only involve failed disks if a disk fails in ports 1, 9 or 17.<br> <br> <br> </body> </html> --------------090005070002040708070202-- _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"