Re: Spare disk could not sleep / standby
On 08.03.2005 14:13, Gordon Henderson wrote: On Tue, 8 Mar 2005, Tobias Hofmann wrote: [...] I had found postings on the net claiming that doing so without unmounting the fs on the raid, this would lead to bad things happening - but your report seems to prove them wrong... I've been using something called noflushd on a couple of small "home servers" for a couple of years now to spin the disks down. I made a posting about it here some time back and the consensus seemed to be (at the time) that it all should "just work"... And indeed it has been just working. Thanks for mentioning this... They are only running RAID-1 though, 2.4 and ext2. I understand the ext3 would force spin-up every 5 seconds which would sort of defeat it. There are other things to be aware of to (things that will defeat using hdparm) - making sure every entry in syslog.conf is -/var/log/whatever (ie. with the hyphen prepended) to stop if doing the fsync on every write which will spin up the disks. They are on UPSs, but they have been known to run-out in the past )-: so a long fsck and some data loss is to be expected. Essentially noflushd blocks the kernel from writing to disk until memory fills up.. So most of the time the box sits with the disks spun down, and only spins up when we do some file reading/saving to them. ...and this is no prob for me, as my idea is to only spin down a raid used for data, not OS... Noflushd is at http://noflushd.sourceforge.net/ and claims to work with 2.6, but says it will never work with journaling FSs like ext3 and XFS. (which is understandable) ...true, but bites me. I,ll still look into it, once I am free to fool around with the raid (which currently is a backup, so I,d hesitate to "kill" it... :) greets, tobi... :) - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
now on to tuning....
Greetings All, I have been lurking for a while I recently put together a raid 5 system (Asus K8NE SIL 3114/2.6.8.1 kernel) with 4 300GB SATA Seagate drives (a lot smaller than the bulk of what seems to be on this list!). Currently this is used for video and mp3 storage, being Reiser on LVM2. So a couple of questions: Bonnie++ to test, but with which parameters ? Also, I have seen the mount option "nolargeio=1" for reiser, but not a lot of information on the impact in raid systems. Any thoughts ? regards, -Peter - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md Grow for Raid 5
Mike Hardy wrote: You're very correct about needing to grow the FS after growing the device though. Most FS's have tools for that, or there's LVM... -Mike I've tested with LVM2 an found, that resizing is not supported at the moment (newest kernel with FC3). LVM can only grow by adding PVs to the VG. Growing PVs is not supported. The only way is to export and reimport the VG configuration with vgcfgbackup and vgcfgrestore and to alter the number of physical extends of the grown PV manually. Greetings, Frank signature.asc Description: OpenPGP digital signature
Re: now on to tuning....
On Wed, 9 Mar 2005 [EMAIL PROTECTED] wrote: > Greetings All, > > I have been lurking for a while I recently put together a raid 5 > system (Asus K8NE SIL 3114/2.6.8.1 kernel) with 4 300GB SATA Seagate > drives (a lot smaller than the bulk of what seems to be on this list!). Size isn't important :) > Currently this is used for video and mp3 storage, being Reiser on LVM2. > > So a couple of questions: > > Bonnie++ to test, but with which parameters ? Also, I have seen the > mount option "nolargeio=1" for reiser, but not a lot of information on > the impact in raid systems. > > Any thoughts ? Why LVM2? Are you taking snapshots, or expecting to re-size the array? I just could never work out why I needed yet another layer of software between the application and the disk platter. There may well by a good reason for it - I did look into LVM some 18 months ago when I was looking at snapshot solutions though, but it was hideously slow (or appeared to be) after I'd taken a snapshot. (My intention was to tape dump from the snapshots, and provide a 'yesterday' & 'the day before' type things. I now use rsync for that - takes time to make the snapshot, but accesses to it are no slower than accessing the live aray. I know nothing about reiser either, so can't help there, I'm afraid, however, XFS has some real-time facilities which might be useful for streaming data, but again, it's not something I've looked into. Anyway - if you are streaming big files, you want nothing more than time dd to test the speed of the thing. Bonnie++ will limit the size of a single file to 2GB (and write multiple 2GB files if you specify a bigger size) and you'll get an indication of how busy the CPU is when it's writing & reading. Your drive will have between 55 and 65MB/sec of head bandwidth, so anything better than that is a bonus, athough the 3114 will have all 4 ports on the same PCI bus, so thats going to be your bottleneck. If you want more speed, you might want to try to reconfigure it in a RAID-10 way, but then you'll only get 2 disks worth of storage rather than the 3 in a RAID-5 setup. Also test streaming over LAN if thats what you are doing - if it's just a 100Mb LAN, then all your disk system needs to be able to do is 12MB/ses and anything more is a bonus. And do check your disks regularly, although I don't think current version of smartmontools fully supports sata under the scsi subsystem yet... Good luck, Gordon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: now on to tuning....
hi peter [EMAIL PROTECTED] wrote: I have been lurking for a while I recently put together a raid 5 system (Asus K8NE SIL 3114/2.6.8.1 kernel) with 4 300GB SATA Seagate drives (a lot smaller than the bulk of what seems to be on this list!). Currently this is used for video and mp3 storage, being Reiser on LVM2. beware that LVM2 _can_ affect your performance. I too believed that the concept of dynamic drives is good, but I experienced a performance hit of about 50% (especially in sequential reads). see my blog entry describing how I built my 2TB file-server at http://variant.ch/phpwiki/WikiBlog/2005-02-27 for some numbers and more explanation. the K8NE has the same SiI 3114 controller as the board I used has; it is connected by a 33mhz 32bit PCI bus and maxes out at 133MiB/s, so for maxmimum performance you might want to connect only two drives to this controller, the other two to the nforce3 chipset SATA ports. Bonnie++ to test, but with which parameters ? normally it's enough to specify a test-file larger (e.g. twice) the memory capacity of the machine you are testing. for a machine with 1GiB RAM: # bonnie++ -s 2gb {other options} you might as well want to specify the "fast" option which skips per-char operations (which are quite useless to test IMHO): # bonnie++ -f {other options} HTH nicola - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Size...
Interestingly enough, having just typed up that last post, it's gotten me thinking... I've just taken delivery of a lot of old PC bits & disks. Mostly 18Gb SCSI drives. So I've built up 2 boxes with 8 disks in each. Only old Xeon 500 processors, but all good stuff in its day. Now I'm thinking that what I have is effectively 2 x 100GB disk arrays... Consuming (lets say) 8W per disk, thats nearly 130W worth of energy plus the processors, etc. I think that in the long-run, I'd be better off just spending 100 quid on a couple of 120GB drive and be done with it... Not as much fun though! Oh how technology moves on! Gordon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: now on to tuning....
Gordon Henderson wrote: And do check your disks regularly, although I don't think current version of smartmontools fully supports sata under the scsi subsystem yet... Actually, if you are using a UP machine, the libata-dev tree has patches that make this work. I believe there may be races on SMP machines however. All 29 drives get a short test every morning and a long test every Sunday morning. Odd results are immediately E-mailed to me by smartd. storage1:/home/brad# smartctl -A -d ata /dev/sda smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time0x0027 252 252 063Pre-fail Always - 5622 4 Start_Stop_Count0x0032 253 253 000Old_age Always - 20 5 Reallocated_Sector_Ct 0x0033 253 253 063Pre-fail Always - 0 6 Read_Channel_Margin 0x0001 253 253 100Pre-fail Offline - 0 7 Seek_Error_Rate 0x000a 253 252 000Old_age Always - 0 8 Seek_Time_Performance 0x0027 250 248 187Pre-fail Always - 35232 9 Power_On_Minutes0x0032 252 252 000Old_age Always - 457h+24m 10 Spin_Retry_Count0x002b 252 252 157Pre-fail Always - 0 11 Calibration_Retry_Count 0x002b 253 252 223Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 253 253 000Old_age Always - 34 192 Power-Off_Retract_Count 0x0032 253 253 000Old_age Always - 0 193 Load_Cycle_Count0x0032 253 253 000Old_age Always - 0 194 Temperature_Celsius 0x0032 253 253 000Old_age Always - 35 195 Hardware_ECC_Recovered 0x000a 253 252 000Old_age Always - 1411 196 Reallocated_Event_Count 0x0008 253 253 000Old_age Offline - 0 197 Current_Pending_Sector 0x0008 253 253 000Old_age Offline - 0 198 Offline_Uncorrectable 0x0008 253 253 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x0008 199 199 000Old_age Offline - 0 200 Multi_Zone_Error_Rate 0x000a 253 252 000Old_age Always - 0 201 Soft_Read_Error_Rate0x000a 253 252 000Old_age Always - 3 202 TA_Increase_Count 0x000a 253 252 000Old_age Always - 0 203 Run_Out_Cancel 0x000b 253 252 180Pre-fail Always - 0 204 Shock_Count_Write_Opern 0x000a 253 252 000Old_age Always - 0 205 Shock_Rate_Write_Opern 0x000a 253 252 000Old_age Always - 0 207 Spin_High_Current 0x002a 252 252 000Old_age Always - 0 208 Spin_Buzz 0x002a 252 252 000Old_age Always - 0 209 Offline_Seek_Performnce 0x0024 194 194 000Old_age Offline - 0 99 Unknown_Attribute 0x0004 253 253 000Old_age Offline - 0 100 Unknown_Attribute 0x0004 253 253 000Old_age Offline - 0 101 Unknown_Attribute 0x0004 253 253 000Old_age Offline - 0 Regards, Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: now on to tuning....
On Wed, 9 Mar 2005, Brad Campbell wrote: > Gordon Henderson wrote: > > > And do check your disks regularly, although I don't think current version > > of smartmontools fully supports sata under the scsi subsystem yet... > > > > Actually, if you are using a UP machine, the libata-dev tree has patches > that make this work. I believe there may be races on SMP machines > however. > > All 29 drives get a short test every morning and a long test every > Sunday morning. Odd results are immediately E-mailed to me by smartd. > > storage1:/home/brad# smartctl -A -d ata /dev/sda Ahhh... I've been waiting for the magical "-d libata" to make it work.. I never through to just try the -d ata! And hurrah: Old: skylla:/home/gordonh# smartctl -a /dev/sda smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA Maxtor 6Y080M0 Version: YAR5 SATA disks accessed via libata are not currently supported by smartmontools. When libata is given an ATA pass-thru ioctl() then an additional '-d libata' device type will be added to smartmontools. New: smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: Maxtor 6Y080M0 Serial Number:Y2ASRT5E Firmware Version: YAR51HW0 User Capacity:80,000,000,000 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is:Wed Mar 9 11:09:12 2005 GMT SMART support is: Available - device has SMART capability. SMART support is: Enabled Cheers, Gordon Ps. This is a Dell 1U poweredge FWIW - Maxtor disks )-: - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Problem with auto-assembly on Itanium
Hi, I try to create a Raid 1 device from two partitions on a Itanium, but i can't get it to auto-assembly the raid when rebooting. Since it uses the GPT partition-scheme i have to use parted. I set the raid-flag on the partitions with "set 1 raid on" with no luck. I've also tried the "md=0,/dev/sdb2,/dev/sdc2" kernel option and still no automatically assembly of the raid device. I tried similar setup on a i386 box (but not using parted) and it worked fine. My suspicion is that parted doesn't set the partition type to 0xFD or the kernel-code for auto-assembly the raid don't look into GPT-partitions. Is there any way i can make this work? Could it be doable with mdadm in a initrd? Many thanks in advance, Jimmy Hedman - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: now on to tuning....
Good point about maxing out the pci bus... - I already use the nForce for mirrored boot drives, so that's not an option. The IDE controllers are empty at the moment (save for a DVD drive); I will give this a thought. Thanks for the feedback, -P -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Nicola Fankhauser Sent: Wednesday, March 09, 2005 10:48 AM To: linux-raid@vger.kernel.org Subject: Re: now on to tuning hi peter [EMAIL PROTECTED] wrote: > I have been lurking for a while I recently put together a raid 5 > system (Asus K8NE SIL 3114/2.6.8.1 kernel) with 4 300GB SATA Seagate > drives (a lot smaller than the bulk of what seems to be on this > list!). Currently this is used for video and mp3 storage, being > Reiser on LVM2. beware that LVM2 _can_ affect your performance. I too believed that the concept of dynamic drives is good, but I experienced a performance hit of about 50% (especially in sequential reads). see my blog entry describing how I built my 2TB file-server at http://variant.ch/phpwiki/WikiBlog/2005-02-27 for some numbers and more explanation. the K8NE has the same SiI 3114 controller as the board I used has; it is connected by a 33mhz 32bit PCI bus and maxes out at 133MiB/s, so for maxmimum performance you might want to connect only two drives to this controller, the other two to the nforce3 chipset SATA ports. > Bonnie++ to test, but with which parameters ? normally it's enough to specify a test-file larger (e.g. twice) the memory capacity of the machine you are testing. for a machine with 1GiB RAM: # bonnie++ -s 2gb {other options} you might as well want to specify the "fast" option which skips per-char operations (which are quite useless to test IMHO): # bonnie++ -f {other options} HTH nicola - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Recovery after Partial Writes
Hi, I have another question regarding how softraid handles simple system crashes (no disk crash) for Raid Level 4-6. Let's say that in a system with 5 disks we write one block, and the writes for disks 1-3 go through but those for disks 4-5 do not. When the computer restarts and the softraid device is remounted, is there some kind of recovery done at the raid level because the checksum for that block will now be off? Or is this simply left to the FS to deal with? Thank you very much for your help, Can Sar - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH md 0 of 4] Introduction
Hi Neil, On Tue, 2005-03-08 at 21:17, Neil Brown wrote: > On Monday March 7, [EMAIL PROTECTED] wrote: > > NeilBrown <[EMAIL PROTECTED]> wrote: > > > > > > The first two are trivial and should apply equally to 2.6.11 > > > > > > The second two fix bugs that were introduced by the recent > > > bitmap-based-intent-logging patches and so are not relevant > > > to 2.6.11 yet. > > > > The changelog for the "Fix typo in super_1_sync" patch doesn't actually say > > what the patch does. What are the user-visible consequences of not fixing > > this? > > --- > This fixes possible inconsistencies that might arise in a version-1 > superblock when devices fail and are removed. > > Usage of version-1 superblocks is not yet widespread and no actual > problems have been reported. > EVMS 2.5.1 (http://evms.sf.net) has provided support for creation of MD arrays using version-1 superblock. Some of EVMS users actually tried to use this new functionality. You probably remember I posted a problem and a patch to fix version-1 superblock update code. We will continue to test and will report any problems. -- Regards, Mike T. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH md 0 of 4] Introduction
Neil Brown <[EMAIL PROTECTED]> wrote: > On Tuesday March 8, [EMAIL PROTECTED] wrote: > > Have you remodelled the md/raid1 make_request() fn? > > Somewhat. Write requests are queued, and raid1d submits them when > it is happy that all bitmap updates have been done. OK - so a slight modification of the kernel generic_make_request (I haven't looked). Mind you, I think that Paul said that just before clearing bitmap entries, incoming requests were checked to see if a bitmap entry should be marked again.. Perhaps both things happen. Bitmap pages in memory are updated as clean after pending writes have finished and then marked as dirty as necessary, and then flushed and when the flush finishes new accumulated requests are started. One can > There is no '1/100th' second or anything like that. I was trying in a way to give a definite image to what happens, rather than speak abstractly. I'm sure that the ordinary kernel mechanism for plugging and unplugging is used, as much as it is possible. If yu unplug when the request struct reservoir is exhausted, then it will be at 1K requests. If they are each 4KB, that will be every 4MB. At say 64MB/s, that will be every 1/16 s. And unplugging may happen more frequently because of other kernel magic mumble mumble ... > When a write request arrives, the queue is 'plugged', requests are > queued, and bits in the in-memory bitmap are set. OK. > When the queue is unplugged (by the filesystem or timeout) the bitmap > changes (if any) are flushed to disk, then the queued requests are > submitted. That accumulates bitmap markings into the minimum number of extra transactions. It does impose extra latency, however. I'm intrigued by exactly how you exert the memory pressure required to force just the dirty bitmap pages out. I'll have to look it up. > Bits on disk are cleaned lazily. OK - so the disk bitmap state is always pessimistic. That's fine. Very good. > Note that for many applications, the bitmap does not need to be huge. > 4K is enough for 1 bit per 2-3 megabytes on many large drives. > Having to sync 3 meg when just one block might be out-of-sync may seem > like a waste, but it is heaps better than syncing 100Gig!! Yes - I used 1 bit per 1K, falling back to 1 bit per 2MB under memory pressure. > > And if so, do you also aggregate them? And what steps are taken to > > preserve write ordering constraints (do some overlying file systems > > still require these)? > > filesystems have never had any write ordering constraints, except that > IO must not be processed before it is requested, nor after it has been > acknowledged. md continue to obey these restraints. Out of curiousity, is aggregation done on the queued requests? Or are they all kept at 4KB? (or whatever - 1KB). Thanks! Peter - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Spare disk could not sleep / standby [probably dangerous PATCH]
This patch removes my problem. I hope it doesn't have influence on the stability of the system. It is simple: The Update routine skips normaly only "faulty" disks. Now it skips all disk that are not part of the working array ( raid_disk == -1 ) I made some testing, but surely not all, so : DON'T APPLY TO YOUR SYSTEM WITH IMPORTENT DATA ! Regards Peter --- md.c.orig 2005-01-14 16:33:49.0 +0100 +++ md.c2005-03-09 15:27:23.0 +0100 @@ -1340,14 +1340,14 @@ ITERATE_RDEV(mddev,rdev,tmp) { char b[BDEVNAME_SIZE]; dprintk(KERN_INFO "md: "); - if (rdev->faulty) - dprintk("(skipping faulty "); + if (rdev->faulty || rdev->raid_disk < 0) + dprintk("(skipping faulty/spare "); dprintk("%s ", bdevname(rdev->bdev,b)); - if (!rdev->faulty) { + if (!rdev->faulty && !rdev->raid_disk <0 ) { err += write_disk_sb(rdev); } else - dprintk(")\n"); + dprintk("<%d>)\n",rdev->raid_disk); if (!err && mddev->level == LEVEL_MULTIPATH) /* only need to write one superblock... */ break; - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Spare disk could not sleep / standby [probably dangerous PATCH]
Hi Peter, After applying this patch, have you tried stop and restart the MD array? I believe the spares will be kicked out in analyze_sbs() function (see the second ITERATE_RDEV) -- Regards, Mike T. On Wed, 2005-03-09 at 09:53, Peter Evertz wrote: > This patch removes my problem. I hope it doesn't have influence on the > stability of > the system. > It is simple: The Update routine skips normaly only "faulty" disks. Now it > skips all disk > that are not part of the working array ( raid_disk == -1 ) > I made some testing, but surely not all, so : > > DON'T APPLY TO YOUR SYSTEM WITH IMPORTENT DATA ! > > Regards > Peter > > --- md.c.orig 2005-01-14 16:33:49.0 +0100 > +++ md.c2005-03-09 15:27:23.0 +0100 > @@ -1340,14 +1340,14 @@ >ITERATE_RDEV(mddev,rdev,tmp) { >char b[BDEVNAME_SIZE]; >dprintk(KERN_INFO "md: "); > - if (rdev->faulty) > - dprintk("(skipping faulty "); > + if (rdev->faulty || rdev->raid_disk < 0) > + dprintk("(skipping faulty/spare "); > >dprintk("%s ", bdevname(rdev->bdev,b)); > - if (!rdev->faulty) { > + if (!rdev->faulty && !rdev->raid_disk <0 ) { >err += write_disk_sb(rdev); >} else > - dprintk(")\n"); > + dprintk("<%d>)\n",rdev->raid_disk); >if (!err && mddev->level == LEVEL_MULTIPATH) >/* only need to write one superblock... */ >break; > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with auto-assembly on Itanium
On Wed, Mar 09, 2005 at 11:28:48AM +0100, Jimmy Hedman wrote: Is there any way i can make this work? Could it be doable with mdadm in a initrd? mdassembled was devise for this purpose. create an /etc/mdadm.conf with echo "DEVICE partitions" >> /etc/mdadm.conf /sbin/mdadm -D -b /dev/md0 | grep '^ARRAY' >> /etc/mdadm.conf copy the mdadm.conf and mdassemble to initrd make linuxrc run mdassemble. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Spare disk could not sleep / standby [probably dangerous PATCH]
Mike Tran writes: Hi Peter, After applying this patch, have you tried stop and restart the MD array? I believe the spares will be kicked out in analyze_sbs() function (see the second ITERATE_RDEV) mdadm ( v1.6.0 - 4 June 2004 ) shows the arrays complete including spare. /proc/mdstat is ok I booted with my patched raid modules. So analyze_sbs() should have run. Maybe it works only for 0.90 superblocks, I haven't tried 1.00 No problems yet. If it really fails the hard way, I will go to the next Internetcafe and tell you about it :) Peter -- Regards, Mike T. On Wed, 2005-03-09 at 09:53, Peter Evertz wrote: This patch removes my problem. I hope it doesn't have influence on the stability of the system. It is simple: The Update routine skips normaly only "faulty" disks. Now it skips all disk that are not part of the working array ( raid_disk == -1 ) I made some testing, but surely not all, so : DON'T APPLY TO YOUR SYSTEM WITH IMPORTENT DATA ! Regards Peter --- md.c.orig 2005-01-14 16:33:49.0 +0100 +++ md.c2005-03-09 15:27:23.0 +0100 @@ -1340,14 +1340,14 @@ ITERATE_RDEV(mddev,rdev,tmp) { char b[BDEVNAME_SIZE]; dprintk(KERN_INFO "md: "); - if (rdev->faulty) - dprintk("(skipping faulty "); + if (rdev->faulty || rdev->raid_disk < 0) + dprintk("(skipping faulty/spare "); dprintk("%s ", bdevname(rdev->bdev,b)); - if (!rdev->faulty) { + if (!rdev->faulty && !rdev->raid_disk <0 ) { err += write_disk_sb(rdev); } else - dprintk(")\n"); + dprintk("<%d>)\n",rdev->raid_disk); if (!err && mddev->level == LEVEL_MULTIPATH) /* only need to write one superblock... */ break; - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Spare disk could not sleep / standby [probably dangerous PATCH]
I tried the patch and immediately found problems. On creation of raid1 array, only the spare has md superblock, the raid disks has no superblock. For instance: mdadm -C /dev/md0 -l 1 -n 2 /dev/hdd1 /dev/hdd2 -x 1 /dev/hdd3 [wait for resync to finish if you want to...] mdadm --stop /dev/md0 mdadm --examine /dev/hdd1 (no super block found) mdadm --examine /dev/hdd2 (no super block found) mdadm --examine /dev/hdd3 (nice output) If you want to skip spares, you will need to alter the patch (see below) On Wed, 2005-03-09 at 14:05, Peter Evertz wrote: > Mike Tran writes: > > > Hi Peter, > > > > After applying this patch, have you tried stop and restart the MD > > array? I believe the spares will be kicked out in analyze_sbs() > > function (see the second ITERATE_RDEV) > mdadm ( v1.6.0 - 4 June 2004 ) > shows the arrays complete including spare. > /proc/mdstat is ok > > I booted with my patched raid modules. So analyze_sbs() should have run. > Maybe it works only for 0.90 superblocks, I haven't tried 1.00 > > No problems yet. If it really fails the hard way, I will go to the next > Internetcafe and tell you about it :) > > Peter > > > > -- > > Regards, > > Mike T. > > > > > > On Wed, 2005-03-09 at 09:53, Peter Evertz wrote: > >> This patch removes my problem. I hope it doesn't have influence on the > >> stability of > >> the system. > >> It is simple: The Update routine skips normaly only "faulty" disks. Now it > >> skips all disk > >> that are not part of the working array ( raid_disk == -1 ) > >> I made some testing, but surely not all, so : > >> > >> DON'T APPLY TO YOUR SYSTEM WITH IMPORTENT DATA ! > >> > >> Regards > >> Peter > >> > >> --- md.c.orig 2005-01-14 16:33:49.0 +0100 > >> +++ md.c2005-03-09 15:27:23.0 +0100 > >> @@ -1340,14 +1340,14 @@ > >>ITERATE_RDEV(mddev,rdev,tmp) { > >>char b[BDEVNAME_SIZE]; > >>dprintk(KERN_INFO "md: "); > >> - if (rdev->faulty) > >> - dprintk("(skipping faulty "); > >> + if (rdev->faulty || rdev->raid_disk < 0) > >> + dprintk("(skipping faulty/spare "); > >> > >>dprintk("%s ", bdevname(rdev->bdev,b)); > >> - if (!rdev->faulty) { > >> + if (!rdev->faulty && !rdev->raid_disk <0 ) { if (!rdev->faulty && rdev->in_sync) err += write_disk_sb(rdev); else { if (rdev->faulty) dprintk(" faulty.\n"); else dprintk(" spare.\n"); } /* * Don't try this :( * because this still breaks creation of new md array and.. * for existing arrays with spares, the spares will be kicked out when * the arrays are re-assembled. */ -- Regards, Mike T. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Spare disk could not sleep / standby [probably dangerous PATCH]
Mike Tran writes: I tried the patch and immediately found problems. On creation of raid1 array, only the spare has md superblock, the raid disks has no superblock. For instance: mdadm -C /dev/md0 -l 1 -n 2 /dev/hdd1 /dev/hdd2 -x 1 /dev/hdd3 [wait for resync to finish if you want to...] mdadm --stop /dev/md0 mdadm --examine /dev/hdd1 (no super block found) mdadm --examine /dev/hdd2 (no super block found) mdadm --examine /dev/hdd3 (nice output) If you want to skip spares, you will need to alter the patch (see below) Ooops ! I should't post patches for everyone. Nevertheless it workes for me, but I see that your version, is much better. Testing for raid_disk <0 is not a good idea when you (re)create an array. The test for "in_sync" is better, but still don't know if it works under all circumstances. Thanks Peter On Wed, 2005-03-09 at 14:05, Peter Evertz wrote: Mike Tran writes: > Hi Peter, > > After applying this patch, have you tried stop and restart the MD > array? I believe the spares will be kicked out in analyze_sbs() > function (see the second ITERATE_RDEV) mdadm ( v1.6.0 - 4 June 2004 ) shows the arrays complete including spare. /proc/mdstat is ok I booted with my patched raid modules. So analyze_sbs() should have run. Maybe it works only for 0.90 superblocks, I haven't tried 1.00 No problems yet. If it really fails the hard way, I will go to the next Internetcafe and tell you about it :) Peter > > -- > Regards, > Mike T. > > > On Wed, 2005-03-09 at 09:53, Peter Evertz wrote: >> This patch removes my problem. I hope it doesn't have influence on the >> stability of >> the system. >> It is simple: The Update routine skips normaly only "faulty" disks. Now it >> skips all disk >> that are not part of the working array ( raid_disk == -1 ) >> I made some testing, but surely not all, so : >> >> DON'T APPLY TO YOUR SYSTEM WITH IMPORTENT DATA ! >> >> Regards >> Peter >> >> --- md.c.orig 2005-01-14 16:33:49.0 +0100 >> +++ md.c2005-03-09 15:27:23.0 +0100 >> @@ -1340,14 +1340,14 @@ >>ITERATE_RDEV(mddev,rdev,tmp) { >>char b[BDEVNAME_SIZE]; >>dprintk(KERN_INFO "md: "); >> - if (rdev->faulty) >> - dprintk("(skipping faulty "); >> + if (rdev->faulty || rdev->raid_disk < 0) >> + dprintk("(skipping faulty/spare "); >> >>dprintk("%s ", bdevname(rdev->bdev,b)); >> - if (!rdev->faulty) { >> + if (!rdev->faulty && !rdev->raid_disk <0 ) { if (!rdev->faulty && rdev->in_sync) err += write_disk_sb(rdev); else { if (rdev->faulty) dprintk(" faulty.\n"); else dprintk(" spare.\n"); } /* * Don't try this :( * because this still breaks creation of new md array and.. * for existing arrays with spares, the spares will be kicked out when * the arrays are re-assembled. */ -- Regards, Mike T. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
mdadm --dangerous-no-resync equivalent
Hi, I have an installer (http://sourceforge.net/projects/terraformix/) that creates Raid 1 arrays, previously the arrays were created with mkraid using the --dangerous-no-resync option. I am now required to build the arrays with mdadm and have the following questions ; 1) Is there an equivalent of --dangerous-no-resync in mdadm ? 2) Can I just go ahead and install onto a newly created RAID 1 array without waiting for it to resync ? 3) Can I just go ahead and install onto a newly created RAID 5 array without waiting for it to resync ? Thanks in advance. -- flame <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Add a spare to raid5 array?
Can a spare be added to an existing raid 5 array? I do not see any way to do it. John - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm --dangerous-no-resync equivalent
On Thursday March 10, [EMAIL PROTECTED] wrote: > Hi, > > I have an installer (http://sourceforge.net/projects/terraformix/) that > creates Raid 1 arrays, previously the arrays were created with mkraid > using the --dangerous-no-resync option. I am now required to build the > arrays with mdadm and have the following questions ; > > 1) Is there an equivalent of --dangerous-no-resync in mdadm ? No, though there might be one day, in which case it would be --assume-clean (which works with --build, but is currently ignored for --create). > > 2) Can I just go ahead and install onto a newly created RAID 1 array > without waiting for it to resync ? Yes. > > 3) Can I just go ahead and install onto a newly created RAID 5 array > without waiting for it to resync ? Yes. > > Thanks in advance. You're welcome. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm --dangerous-no-resync equivalent
On 10 Mar 2005, [EMAIL PROTECTED] wrote: > Hi, > > I have an installer (http://sourceforge.net/projects/terraformix/) that > creates Raid 1 arrays, previously the arrays were created with mkraid > using the --dangerous-no-resync option. I am now required to build the > arrays with mdadm and have the following questions ; > > 1) Is there an equivalent of --dangerous-no-resync in mdadm ? > > 2) Can I just go ahead and install onto a newly created RAID 1 array > without waiting for it to resync ? Yes. > 3) Can I just go ahead and install onto a newly created RAID 5 array > without waiting for it to resync ? Yes. Also, you can happily reboot while the resync is no progress with the full expectation that it will work correctly -- assemble, then take up the resync process again -- when your machine restarts. I have had no problems with this at all, despite doing a large number of installs onto both of those RAID levels. Also, the Debian installer for testing/unstable makes the assumption that this will "just work" in the RAID component -- at least, in so far as not taking any special care to avoid rebooting during the resync process. Regards, Daniel -- A large number of installed systems work by fiat. That is, they work by being declared to work. -- Anatol Holt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] md bitmap bug fixes
Neil, here are a couple of patches -- this one for the kernel, the next for mdadm. They fix a few issues that I found while testing the new bitmap intent logging code. Briefly, the issues were: kernel: added call to bitmap_daemon_work() from raid1d so that the bitmap would actually get cleared fixed the marking of pages with BITMAP_CLEAN so that the bitmap would get cleared correctly after resync and normal write I/O pass back errors from write_page() since it now does actual writes itself sync_size changed to sectors (was array_size which was KB) -- some divisions by 2 were needed mdadm: avoid setting of sb->events_lo = 1 when creating a 0.90 superblock -- it doesn't seem to be necessary and it was causing the event counters to start at 4 billion+ (events_lo is actually the high part of the events counter, on little endian machines anyway) some sync_size changes, as in the kernel if'ed out super1 definition which is now in the kernel headers included sys/time.h to avoid compile error Thanks, Paul Signed-Off-By: Paul Clements <[EMAIL PROTECTED]> bitmap.c | 58 +- raid1.c |1 + 2 files changed, 34 insertions(+), 25 deletions(-) diff -purN --exclude-from /export/public/clemep/tmp/dontdiff linux-2.6.11-rc3-mm2-patch-all/drivers/md/bitmap.c linux-2.6.11-rc3-mm2-patch-all-bitmap-bug-fix/drivers/md/bitmap.c --- linux-2.6.11-rc3-mm2-patch-all/drivers/md/bitmap.c Fri Feb 18 15:44:03 2005 +++ linux-2.6.11-rc3-mm2-patch-all-bitmap-bug-fix/drivers/md/bitmap.c Wed Mar 9 14:55:03 2005 @@ -265,6 +265,7 @@ static int write_page(struct page *page, { int ret = -ENOMEM; + PRINTK("bitmap write page %lu\n", page->index); lock_page(page); if (page->mapping == NULL) @@ -350,8 +351,7 @@ int bitmap_update_sb(struct bitmap *bitm if (!bitmap->mddev->degraded) sb->events_cleared = cpu_to_le64(bitmap->mddev->events); kunmap(bitmap->sb_page); - write_page(bitmap->sb_page, 0); - return 0; + return write_page(bitmap->sb_page, 0); } /* print out the bitmap file superblock */ @@ -363,21 +363,22 @@ void bitmap_print_sb(struct bitmap *bitm return; sb = (bitmap_super_t *)kmap(bitmap->sb_page); printk(KERN_DEBUG "%s: bitmap file superblock:\n", bmname(bitmap)); - printk(KERN_DEBUG " magic: %08x\n", le32_to_cpu(sb->magic)); - printk(KERN_DEBUG " version: %d\n", le32_to_cpu(sb->version)); - printk(KERN_DEBUG "uuid: %08x.%08x.%08x.%08x\n", + printk(KERN_DEBUG " magic: %08x\n", le32_to_cpu(sb->magic)); + printk(KERN_DEBUG " version: %d\n", le32_to_cpu(sb->version)); + printk(KERN_DEBUG " uuid: %08x.%08x.%08x.%08x\n", *(__u32 *)(sb->uuid+0), *(__u32 *)(sb->uuid+4), *(__u32 *)(sb->uuid+8), *(__u32 *)(sb->uuid+12)); - printk(KERN_DEBUG " events: %llu\n", + printk(KERN_DEBUG "events: %llu\n", (unsigned long long) le64_to_cpu(sb->events)); - printk(KERN_DEBUG "events_clred: %llu\n", + printk(KERN_DEBUG "events cleared: %llu\n", (unsigned long long) le64_to_cpu(sb->events_cleared)); - printk(KERN_DEBUG " state: %08x\n", le32_to_cpu(sb->state)); - printk(KERN_DEBUG " chunksize: %d B\n", le32_to_cpu(sb->chunksize)); - printk(KERN_DEBUG "daemon sleep: %ds\n", le32_to_cpu(sb->daemon_sleep)); - printk(KERN_DEBUG " sync size: %llu KB\n", le64_to_cpu(sb->sync_size)); + printk(KERN_DEBUG " state: %08x\n", le32_to_cpu(sb->state)); + printk(KERN_DEBUG " chunksize: %d B\n", le32_to_cpu(sb->chunksize)); + printk(KERN_DEBUG " daemon sleep: %ds\n", le32_to_cpu(sb->daemon_sleep)); + printk(KERN_DEBUG " sync size: %llu KB\n", (unsigned long long) + le64_to_cpu(sb->sync_size) / 2); kunmap(bitmap->sb_page); } @@ -734,7 +735,8 @@ int bitmap_unplug(struct bitmap *bitmap) spin_unlock_irqrestore(&bitmap->lock, flags); if (attr & (BITMAP_PAGE_DIRTY | BITMAP_PAGE_NEEDWRITE)) - write_page(page, 0); + if (write_page(page, 0) != 0) + return 1; } if (wait) { /* if any writes were performed, we need to wait on them */ spin_lock_irq(&bitmap->write_lock); @@ -795,7 +797,7 @@ static int bitmap_init_from_disk(struct bytes + sizeof(bitmap_super_t)); goto out; } - num_pages++; + // PRC: ???: num_pages++; bitmap->filemap = kmalloc(sizeof(struct page *) * num_pages, GFP_KERNEL); if (!bitmap->filemap) { ret =
[PATCH 2/2] md bitmap bug fixes
Here's the mdadm patch... Paul Clements wrote: Neil, here are a couple of patches -- this one for the kernel, the next for mdadm. They fix a few issues that I found while testing the new bitmap intent logging code. Briefly, the issues were: kernel: added call to bitmap_daemon_work() from raid1d so that the bitmap would actually get cleared fixed the marking of pages with BITMAP_CLEAN so that the bitmap would get cleared correctly after resync and normal write I/O pass back errors from write_page() since it now does actual writes itself sync_size changed to sectors (was array_size which was KB) -- some divisions by 2 were needed mdadm: avoid setting of sb->events_lo = 1 when creating a 0.90 superblock -- it doesn't seem to be necessary and it was causing the event counters to start at 4 billion+ (events_lo is actually the high part of the events counter, on little endian machines anyway) some sync_size changes, as in the kernel if'ed out super1 definition which is now in the kernel headers included sys/time.h to avoid compile error Thanks, Paul diff -purN --exclude makepkg --exclude rpm --exclude *.DIST --exclude md_u.h --exclude md_p.h --exclude bitmap.h --exclude mdadm.steeleye.spec --exclude-from /export/public/clemep/tmp/dontdiff mdadm-2.0-devel-1-PRISTINE/bitmap.c mdadm-2.0-devel-1-bitmap-bug-fix/bitmap.c --- mdadm-2.0-devel-1-PRISTINE/bitmap.c Sun Feb 13 22:00:00 2005 +++ mdadm-2.0-devel-1-bitmap-bug-fix/bitmap.c Mon Mar 7 12:15:38 2005 @@ -168,7 +168,8 @@ bitmap_info_t *bitmap_fd_read(int fd, in if (read_bits < total_bits) { /* file truncated... */ fprintf(stderr, Name ": WARNING: bitmap file is not large " - "enough for array size %llu!\n\n", info->sb.sync_size); + "enough for array size %lluKB (%llu/%llu)!\n\n", + info->sb.sync_size / 2, read_bits, total_bits); total_bits = read_bits; } out: @@ -226,13 +227,16 @@ int ExamineBitmap(char *filename, int br *(__u32 *)(sb->uuid+4), *(__u32 *)(sb->uuid+8), *(__u32 *)(sb->uuid+12)); - printf(" Events : %llu\n", sb->events); - printf(" Events Cleared : %llu\n", sb->events_cleared); + printf(" Events : %llu (%d.%llu)\n", sb->events, + (__u32)sb->events, sb->events >> 32); + printf(" Events Cleared : %llu (%d.%llu)\n", sb->events_cleared, + (__u32)sb->events_cleared, + sb->events_cleared >> 32); printf(" State : %s\n", bitmap_state(sb->state)); printf(" Chunksize : %s\n", human_chunksize(sb->chunksize)); printf(" Daemon : %ds flush period\n", sb->daemon_sleep); - printf(" Sync Size : %llu%s\n", sb->sync_size, - human_size(sb->sync_size * 1024)); + printf(" Sync Size : %lluKB%s\n", sb->sync_size / 2, + human_size(sb->sync_size * 512)); if (brief) goto free_info; printf(" Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n", diff -purN --exclude makepkg --exclude rpm --exclude *.DIST --exclude md_u.h --exclude md_p.h --exclude bitmap.h --exclude mdadm.steeleye.spec --exclude-from /export/public/clemep/tmp/dontdiff mdadm-2.0-devel-1-PRISTINE/mdstat.c mdadm-2.0-devel-1-bitmap-bug-fix/mdstat.c --- mdadm-2.0-devel-1-PRISTINE/mdstat.c Tue Aug 10 21:28:50 2004 +++ mdadm-2.0-devel-1-bitmap-bug-fix/mdstat.c Mon Mar 7 11:09:29 2005 @@ -86,6 +86,7 @@ #include "mdadm.h" #include "dlink.h" #include +#include void free_mdstat(struct mdstat_ent *ms) { diff -purN --exclude makepkg --exclude rpm --exclude *.DIST --exclude md_u.h --exclude md_p.h --exclude bitmap.h --exclude mdadm.steeleye.spec --exclude-from /export/public/clemep/tmp/dontdiff mdadm-2.0-devel-1-PRISTINE/super0.c mdadm-2.0-devel-1-bitmap-bug-fix/super0.c --- mdadm-2.0-devel-1-PRISTINE/super0.c Sun Feb 13 21:59:45 2005 +++ mdadm-2.0-devel-1-bitmap-bug-fix/super0.c Mon Mar 7 13:27:38 2005 @@ -364,7 +364,8 @@ static int init_super0(void **sbp, mdu_a sb->failed_disks = info->failed_disks; sb->spare_disks = info->spare_disks; sb->events_hi = 0; - sb->events_lo = 1; + // PRC: why? sb->events_lo = 1; + sb->events_lo = 0; sb->layout = info->layout; sb->chunk_size = info->chunk_size; diff -purN --exclude makepkg --exclude rpm --exclude *.DIST --exclude md_u.h --exclude md_p.h --exclude bitmap.h --exclude mdadm.steeleye.spec --exclude-from /export/public/clemep/tmp/dontdiff mdadm-2.0-devel-1-PRISTINE/super1.c mdadm-2.0-devel-1-bitmap-bug-fix/super1.c --- mdadm-2.0-devel-1-PRISTINE/super1.c Sun Feb 13 22:00:44 2005 +++ mdad