Another feature to look for is spin down of the dedicated hot spare. Go Vikings :) Patrick
> On Feb 21, 2016, at 7:23 AM, Marcus MERIGHI <mcmer-open...@tor.at> wrote: > > ti...@openmailbox.org (Tinker), 2016.02.20 (Sat) 21:05 (CET): >> So glad to understand better what's in the box. >> >> Also please note that I'm not trying to suggest to implement lots of >> crap, am perfectly clear that high security is correlated with low >> complexity. >> >> On 2016-02-21 00:29, Marcus MERIGHI wrote: >>> ti...@openmailbox.org (Tinker), 2016.02.20 (Sat) 16:43 (CET): >> .. >>> You appear to mean bioctl(8). Thats the only place I could find the word >>> 'patrol'. bioctl(8) can control more than softraid(4) devices. >>> >>> bio(4): >>> The following device drivers register with bio for volume >>> management: >>> >>> ami(4) American Megatrends Inc. MegaRAID >>> PATA/SATA/SCSI RAID controller >>> arc(4) Areca Technology Corporation SAS/SATA RAID >>> controller >>> cac(4) Compaq Smart Array 2/3/4 SCSI RAID controller >>> ciss(4) Compaq Smart Array SAS/SATA/SCSI RAID >>> controller >>> ips(4) IBM SATA/SCSI ServeRAID controller >>> mfi(4) LSI Logic & Dell MegaRAID SAS RAID controller >>> mpi(4) LSI Logic Fusion-MPT Message Passing Interface >>> mpii(4) LSI Logic Fusion-MPT Message Passing Interface >>> II >>> softraid(4) Software RAID >>> >>> It is talking about controlling a HW raid controller, in that 'patrol' >>> paragraph, isn't it? >> >> So by this you mean that patrolling is really implemented for >> softraid?? > > No, I said the opposite. > > I'm sure my english language capabilities are not perfect. But what you > make of it is really surprising! (And even funny in the cabaret way.) > > I'll keep trying. But sooner or later we'll have to take this off list. > Or to newbies. There you get help from the same people but without > having your misinterpretations in the 'official' archives for other poor > souls to find ;-) > > http://mailman.theapt.org/listinfo/openbsd-newbies > >> (Karel and Constantine don't agree??) >> >> So I just do.. "bioctl -t start sdX" wher sdX is the name of my softraid >> device, and it'll do the "scrub" as in reading through all underlying > > bioctl(8) is clear, I think: > -t patrol-function > Control the RAID card's patrol functionality, if > supported. patrol-function may be one of: > > Why do you think it will work for softraid(4) when it says it does for > hardware-RAID? > > I have a theory: you have some experience with other Operating Systems > and their built in help system that have led you to not fully read but > just search/skim for keywords. Do yourself (and me) a favour and read > them fully. Top to bottom. Take every word as put there thoughtfully, > not in a hurry. You can find manpage content discussions all over the > archives. manpages are taken seriously. > > Please repeat: bio(4)/bioctl(8) controls RAID devices. These can be in > hardware or software. Some functions (-a, -b, -H, -t, -u) are only > useable/usefull when controlling a hardware RAID. The manpage even gives > direct clues on whether hardware- or software RAID is the topic. First > synopsis, second synopsis. 'The options for RAID controllers are as > follows:' (=hardware) 'In addition to the relevant options listed above, > the options for softraid(4) devices are as follows:' (=software). > Did you note the 'relevant' part? That word is there on purpose, I > suppose. It is there to tell you that not all, but the relevant parts of > the hardware RAID parameters also apply to software RAID (that comes > below). I would consider '-v' relevant, '-a' ('Control the RAID card's > alarm functionality, if supported') not. > > (Example: what '-a' does for hardware RAID can be done with sensorsd(8) > for software RAID (=softraid(4)). Once a softraid volume is configured, > you get 'hw.sensors.softraid0.drive0=online (sd1), OK'. > Try 'sysctl hw.sensors.softraid0'.) > >> physical media to check its internal integrity so for RAID1C that will be >> data readability and that checksums are correct, and "doas bioctl > softraid0" >> will show me the % status, and if I don't get any errors before it goes > back >> to normal it means the patrol was successful right? > > No idea, never had a hardware RAID controller. > >> (And as usual patrol is implemented to have the lowest priority, so it >> should not interfere extreemely much with ordinary SSD softraid operation.) > > I think the patrolling is done by the hardware RAID controller. > bioctl(8) just commands it to do so. > >>>> * Rebuild - I think I saw some console dump of the status of a rebuild >>>> process on the net, so MAYBE or NO..? >>> >>> That's what it looks like: >>> >>> $ doas bioctl softraid0 >>> Volume Status Size Device >>> softraid0 0 Rebuild 12002360033280 sd6 RAID5 35% done >>> 0 Rebuild 4000786726912 0:0.0 noencl <sd2a> >>> 1 Online 4000786726912 0:1.0 noencl <sd3a> >>> 2 Online 4000786726912 0:2.0 noencl <sd4a> >>> 3 Online 4000786726912 0:3.0 noencl <sd5a> >> >> Yey!! >> >> Wait, can you explain to me what I would write instead of "device" and >> "channel:target[.lun]" in "bioctl -R device" and "bioctl -R >> channel:target[.lun]", AND what effect those would have? > > The above rebuild was started with: > $ bioctl -R /dev/sd2a sd6 > ^^^=RAID volume > ^^^^^^^^^=replacement chunk > > Sidenote: > In fact it was started as 'bioctl -R /dev/sd3a sd7'; I did a reboot in > between, ordering of the disk devices changed but the rebuild continued > flawlessly. > >> Say that my sd0 and sd1 SSD:s run a RAID1C already, can I then make > softraid > > On a 'OpenBSD 5.9 (GENERIC.MP) #1870: Mon Feb 8 17:34:23 MST 2016', > from snapshots, bioctl(8) says: > Valid raidlevels are: > 0 RAID 0: A striping discipline. > 1 RAID 1: A mirroring discipline. > 5 RAID 5: A striping discipline with floating parity > chunk. > C CRYPTO: An encrypting discipline. > c CONCAT: A concatenating discipline. > > What is that 'RAID1C' thing your keep talking about? > >> extend my RAID1C with my sd2 SSD by "rebuilding" it, as a way to live-copy >> in all my data to sd2, so this would work as a kind of live attach even if >> expensive? > > If your sd0 or sd1 fails you can replace them in hardware with sd2 or > have sd2 already plugged and start a rebuild as shown above. > > There are no bioctl(8) parameters for modifying an existing volume. > Just for rebuilding (-R) and failing (-O). > >> Does it work for a softraid that's live already? > > A softraid(4) disk is 'just another disk'(tm). Nothing special. You can > growfs(8) and tunefs(8), I suppose. And you can restore from backups > after you had to do bigger changes than can be done with these utilities. > > No parameters that indicate 'modifiy' or 'edit' or 'append'. Just that > '-l' to list the chunks for creating a volume. > > I'd suggest just playing with it. If you have no real disks for that, > take a look at vnconfig(8) and vnd(4); use two of these as chunks for > a softraid volume. Warning: I have not tested this (vnd+softraid). > > Then try to extend, append, enlarge, shrink, whatever. > >>>> * Hotspare - MAYBE, "man softraid" says "Currently there is no >>>> automated >>>> mechanism to recover from failed disks.", but that is not so specific >>>> wording, and I think I read a hint somewhere that there is hotspare >>>> functionality. >>> >>> bioctl(8) >>> -H channel:target[.lun] >>> If the device at channel:target[.lun] is currently marked >>> ``Unused'', promote it to being a ``Hot Spare''. >>> >>> That's the only mention of 'hot spare'. And again talking about >>> controlling a hardware RAID controller, isn't it? >>> >>> What is 'not so specific' about 'no' (as in "Currently there is *no* >>> automated mechanism to recover from failed disks")? >> >> Awesome. >> >> I guess "bioctl softraid0" will list which hotspares there are currently, >> and that "-d" will drop a hotspare. > > There are no hot spares as seen from bioctl(8). You, the operator, know > that disk sdXYZ is a hot spare, sitting there plugged but idle. Then, > when one of your chunks fails, you do bioctl -R sdXYZa sdABC. > sdXYZa = your "hot spare": just a disk that is already connected to the > system but not used > sdABC = your softraid volume (your RAID1C, what ever that is) > >> The fact that there is hotspare functionality, > > How come you think so? > >> means that there are cases when softraid will take a disk out of use. > > I do not get the connection from a wrong assumption to the above > statement but: > > Yes, there are cases when softraid will take a disk out of use. It looks > somewhat like this: > > $ doas bioctl softraid0 > Volume Status Size Device > softraid0 0 Degraded 12002360033280 sd6 RAID5 > 0 Offline 4000786726912 0:0.0 > 1 Online 4000786726912 0:1.0 noencl <sd3a> > 2 Online 4000786726912 0:2.0 noencl <sd4a> > 3 Online 4000786726912 0:3.0 noencl <sd5a> > >> That will be when that disk reports itself as COMPLETELY out of use ALL BY >> ITSELF, such as self-detaching itself on the level of the SATA controller > or >> reporting failure via some SMART command? > > The reasons why *my* softraid RAID5 went degraded is not clear to me but > it is documented on bugs@. A block could not be read, kernel panic. > Reboot, rebuild, ... > >> A disk just half-breaking with broken sectors and 99% IO slowdown will not >> cause it to go offline though so I guess I should buy enterprise drives > with >> IO access time guarantees then. > > Listen to nick@ (Nick Holland). Search for his older posts, too! > > Are you, for instance, sure your motherboard and all other parts in the > way can handle a disk that just spins down and disconnects from the > SATA/whatever bus? > > Or are you going to have to deal with a kernel panic anyways? > >>>> * Hotswap - MAYBE, this would depend on if there's rebuild. Only >>>> disconnect >>>> ("bioctl -O" I think; "bioctl -d" is to.. unmount or self-destruct a >>>> softraid?) >>> >>> bioctl -O should fail the chunk specified, simulating hardware failure. >>> After this command you have an 'Offline' chunk in the 'bioctl' output. >>> >>> bioctl -d 'detach', not 'destroy'; just as sdX appears when you assamble >>> a softraid volume, this makes it go away. better unmount before... >> >> So "-d" is to take down a whole softraid. "-O" could work to take out a >> single physical disk but it's unclean. > > Please get used to - and use the terms: -d takes the *volume* down. -O fails > a *chunk*. > >> So then, there is a very unrefined hotswapping functionality in that "-O" >> can be used to take out a single physical drive, and "-R" (if I understood >> it correctly above) can be used to plug in a drive. > > You are using unusual terms ('take out a single physical drive' vs. 'Set > the state of device or channel:target[.lun] to offline') but basically > '-O' => offline, '-R' => rebuild => online. > >> Preferable would be to "hotswap" the whole softraid by simply taking it >> offline altogether ("bioctl -d [raid dev]") and then taking it online > ^^^^^^^^^^ = volume >> altogether ("bioctl -c 1 -l [comma-separated devs] softraid0") > ^^^^ = chunks > > What has happened after taking it offline and bringing it back up? Have > you swapped chunks? Broken one taken out, good replacement shoved in? > How is bio(4) supposed to know? > > What I think should happen: > - you have a RAID1 volume sd3 > - assembled from the chunks sd1a and sd2a > - you notice 'something is wrong' e.g. clicking sounds coming from sd2 > - you do 'bioctl -0 sd2a sd3' or, if your hardware allows, just pull out > sd2. > - you replace the failed chunk: either you have a 'hot spare' already > plugged in and waiting; or if your hardware allows, you just shove it > in; or, as in my case, you shut the system down, replace the disk and > restart. > - In case of reboot your RAID1 volume comes up degraded. In all other > cases it just stays degraded. In any case you pray your remaining disk > keeps working. > - the replacement disk shows up as sd4 (for whatever reason, maybe you > left the failed one connected) > - you do all the setup for the new disk (see softraid(4) -> EXAMPLES) > - you do 'bioctl -R sd4a sd3', rebuild starts. > >>>> The man pages are sometimes over-minimalistic with respect to an >>>> individual >>>> user who's trying to learn, this is why I'm asking for your >>>> clarification. >>> >>> I am quite sure the man pages are kept as condensed as they are on >>> purpose. >>> >>> You can always read mplayer(1) if you want something lengthy ;-) >>> >>>> So your clarifications would still be much appreciated. >>> >>> Nothing authoritative from me! >>> I am just trying to flatten your learning curve. >> Awesome. Thank you so much! > > It's taken me over an hour on a rainy sunday to answer; please use at > least the same amount of time on investigating before answering. > > Bye, Marcus