Re: PERC2/Si won't failover
We had the same thing happen on a PE2500 with PERC3/DI, different drive configuration. Text from my tech "diary" of sorts that I keep regarding unique problems I run into: "The system would not boot after all the replacements were there, claiming "no boot device." For one reason or another, the RAID controller detected the new drive, but rather than just add the drive into the container and begin rebuilding, it simply offlined the container, which resulted in the "no boot device" message. I suspect this was because we did not properly inform the controller that were were taking a drive offline before we did so. We did it while the machine was turned off. We probably would have had better luck if we had booted the system, gone into afacli, prepare the enclosure slot to remove the drive, remove the drive, insert the new drive, and issue the proper commands to begin the rebuild if it doesn't start automatically. The OS afacli is much more robust than the RAID BIOS utilities. The solution to get the drive into the RAID container and get it to begin rebuilding was to go into the RAID BIOS and assign the new drive as the failover drive. As soon as I did that, the container started rebuilding. I exited the utility (which automatically saves any settings) and rebooted. The system came back up just fine, like normal, and is now happily rebuilding the RAID array. No data was lost and we now have as close to a completely new system as you can get short of replacing the entire thing." It was just one part of a major system overhaul due to a "ghost" in the SCSI system that kept offlining our container in near-random non- reproducable conditions, but I suspect a similar procedure may help in your instance. Andrew On 31 Jan 2005 at 14:49, Kit Gerrits wrote: > Hey all! > > I have a PowerEDGE 2400 with PERC2/Si with 4x9GB Drives with RedHat > EL 3.0 Container 0: plain 9GB drive (O/S) Container 1: 3x9GB in RAID5 > (data) > > After getting I/O Errors (and gettinge a strange noise from drive > 0:3:0), I did the unthinkable: I pulled the drive from the chassis > without shutting it down. (oops) I have now verified the drive, > cleaned off the partition and rescanned the bus. ...but the drive > won't failover > > I have set it to failover, but the PERC won't failover the drive, even > after a (warm) reboot. > > Did I forget anything? > > Thanks in advance, > > Kit Gerrits > [EMAIL PROTECTED] > > > --- > Debugging info: > --- > > AFA0> disk list > Executing: disk list > > B:ID:L Device Type BlocksBytes/Block UsageShared > Rate -- -- - --- > -- 0:00:0 Disk17783240 512 Initialized > NO 80 0:01:0 Disk17783240 512 > Initialized NO 80 0:02:0 Disk17783240 512 >Initialized NO 80 0:03:0 Disk17783240 512 > Initialized NO 80 > > AFA0> container show failover > Executing: container show failover > > Container Scsi B:ID:L > - -- > 0 --- No Devices Assigned --- > 1 0:03:0 > > AFA0> container list > Executing: container list > Num Total Oth Chunk Scsi Partition > Label Type Size Ctr Size Usage B:ID:L Offset:Size > - -- -- --- -- --- -- - > 0Volume 8.47GBOpen0:00:0 64.0KB:8.47GB > /dev/sda NT > > 1RAID-5 16.9GB 32KB Open0:01:0 64.0KB:8.47GB > /dev/sdb DATA 0:02:0 64.0KB:8.47GB >?:??:? - Missing - > > AFA0> controller show au > Executing: controller show automatic_failover > Automatic failover ENABLED > > AFA0> container scrub 1 > Executing: container scrub 1 > Command Error: (consistency check) > operation on the container because one or more of the container's > partitions fa iled. > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" > in the body of a message to [EMAIL PROTECTED] More majordomo > info at http://vger.kernel.org/majordomo-info.html > Sincerely, Andrew Kinney President and Chief Technology Officer Advantagecom Networks, Inc. http://www.advantagecom.net - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aacraid on Dell PowerEdge 1800
On 11 Mar 2005 at 9:52, Ryan wrote: > Hello, > I've been through several of the news groups over the past few days > and > haven't found an exact answer to my question. > > I'm trying to install Fedora Core 3 on a Dell PowerEdge 1800 server > that > I just purchased, but the version of the aacraid driver for the SATA > raid controller changed and I can't install. > > Just like the newsgroups state, Fedora Core 2 installs just fine. > My > problem is that I can't afford to run software that is outdated. > There have been several security issues lately and FC3 seems to have > fixes for them. > > Any help is greatly appreciated. > > Thank you. > > -Ryan > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" > in the body of a message to [EMAIL PROTECTED] More majordomo > info at http://vger.kernel.org/majordomo-info.html > This is somewhat off-topic, so I'll be brief. If you can install Fedora Core 2 and the only reason you want Fedora Core 3 is newer software, "man yum" after installing the 'yum' package (or choose a different updater to suit your prefs) will be your friend. In other words, a newer OS isn't the only way to get newer software, especially on an OS that has a decent package management system (RPM in this instance). FWIW. Sincerely, Andrew Kinney President and Chief Technology Officer Advantagecom Networks, Inc. http://www.advantagecom.net - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aacraid died on kernel 2.4.27
On 11 Mar 2005 at 20:31, Nic Ferrier wrote: > The machine I am having trouble with has been running MS Windows for 2 > years. > > I just put linux on it (with no other changes) and we get regular > (twice daily) catastrophic crashes. > > Can this be a controller problem? I'm not a hardware expert but it > doesn't sound like one to me. It can be a controller problem, but it can also be a drive problem, cable problem, firmware problem, or a backplane problem. We had a similar instance that was resolved by replacing the drive with a different brand, replacing the backplane, replacing the cabling, replacing the ROMB, and getting the newest firmware. Now, drives fail gracefully instead of taking the whole container offline. Who knows what the actual cause was, but the problem is fixed and that's what I was looking for. Like Mark S. said, many causes, one symptom. That Dell trouble ticket is going to be the best way to get it solved. Their Linux guys have seen it all and can escalate it to an engineer if they haven't. They're going to ask you for the diagnostic output of afacli, so you'll want to get that installed if you haven't already. They can also swap in new components for you. Sincerely, Andrew Kinney President and Chief Technology Officer Advantagecom Networks, Inc. http://www.advantagecom.net - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PERC3/DI aacraid failed disk detection slow
[57]: at 5082220 sec [58]: ID(0:04:0) Cmd[0x28] Fail: Block Range 15544320 : 1557 [59]: at 5082220 sec [60]: ID(0:04:0) Cmd[0x28] Fail: Block Range 5166793 : 5166794 at [61]: 5082220 sec [62]: RAID5 Container 0 Drive 0:4:0 Failure [63]: ID(0:04:0): Timeout detected on cmd[0x28] [64]: SCSI Channel[0]: Timeout Detected On 1 Command(s) [65]: ID(0:04:0) Timeout detected on cmd[0x28] [66]: SCSI Channel[0]: Timeout Detected On 1 Command(s) [67]: ID(0:04:0): Timeout detected on cmd[0x28] [68]: SCSI Channel[0]: Timeout Detected On 1 Command(s) [69]: ID(0:04:0): Timeout detected on cmd[0x28] [70]: SCSI Channel[0]: Timeout Detected On 1 Command(s) [71]: ID(0:04:0): Timeout detected on cmd[0x28] [72]: SCSI Channel[0]: Timeout Detected On 1 Command(s) [73]: ID(0:04:0): Timeout detected on cmd[0x28] [74]: SCSI Channel[0]: Timeout Detected On 1 Command(s) [75]: ID(0:04:0): Timeout detected on cmd[0x28] [76]: SCSI Channel[0]: Timeout Detected On 1 Command(s) [77]: ID(0:04:0) Timeout detected on cmd[0x28] [78]: SCSI Channel[0]: Timeout Detected On 1 Command(s) [79]: ID(0:04:0) Cmd[0x28] Fail: Block Range 0 : 0 at 5082308 sec [80]: 2 can't read mbr dev_t:4 [81]: <...repeats 1 more times> [82]: can't read config from slice #[4] [83]: 2 can't read mbr dev_t:4 [84]: can't read config from slice #[4] [85]: CT_LogMissingEntry: Log missing entry, container 0, dev 4, [86]: signature 0x8f950a4d, nvEntry 65 [87]: CtMarkDead: container 0, deadEntry 4, dev 4, signature 0x8f [88]: 950a4d [89]: CtMarkDead: container 0, deadEntry 4, dev 4, signature 0x8f [90]: 950a4d [91]: CtMarkDead: container 0, deadEntry 4, dev 4, signature 0x8f [92]: 950a4d [93]: CtMarkDead: container 0, deadEntry 4, dev 4, signature 0x8f [94]: 950a4d [95]: CtMarkDead: container 0, deadEntry 4, dev 4, signature 0x8f [96]: 950a4d [97]: RAID5 Failover Container 0 No Failover Assigned [98]: Drive 0:4:0 returning error [99]: [/CODE] 88 seconds to determine the drive failed. In other words, it took 88 seconds from the time it stopped processing commands from the OS until it was ready to continue processing commands from the OS. The kernel killed the storage at 60 seconds, thus hosing the OS since that was the only storage device. Though the controller came back, the OS had already given up and couldn't recover. Am I correct in assessing that the controller's firmware is responsible for this extended delay in detecting the failed disk? Here's the information on our setup: PERC3/DI on Dell PowerEdge 2500 5 disk U160 RAID5 AFA0> controller details Executing: controller details Controller Information -- Device Name: AFA0 Controller Type: PERC 3/Di Access Mode: READ-WRITE Controller Serial Number: Last Six Digits = 4C20D2 Number of Buses: 2 Devices per Bus: 15 Controller CPU: i960 R series Controller CPU Speed: 100 Mhz Controller Memory: 128 Mbytes Battery State: Ok Component Revisions --- CLI: 2.8-0 (Build #6076) API: 2.8-0 (Build #6076) Miniport Driver: 1.1-4 (Build #) Controller Software: 2.8-0 (Build #6092) Controller BIOS: 2.8-0 (Build #6092) Controller Firmware: (Build #6092) Sincerely, Andrew Kinney President and Chief Technology Officer Advantagecom Networks, Inc. http://www.advantagecom.net - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html