ZFS w/failing drives - any equivalent of Solaris FMA?
Hi, Recently, a ZFS pool on my FreeBSD box started showing lots of errors on one drive in a mirrored pair. The pool consists of around 14 drives (as 7 mirrored pairs), hung off of a couple of SuperMicro 8 port SATA controllers (1 drive of each pair is on each controller). One of the drives started picking up a lot of errors (by the end of things it was returning errors pretty much for any reads/writes issued) - and taking ages to complete the I/O's. However, ZFS kept trying to use the drive - e.g. as I attached another drive to the remaining 'good' drive in the mirrored pair, ZFS was still trying to read data off the failed drive (and remaining good one) in order to complete it's re-silver to the newly attached drive. Having posted on the Open Solaris ZFS list - it appears, under Solaris there's an 'FMA Engine' which communicates drive failures and the like to ZFS - advising ZFS when a drive should be marked as 'failed'. Is there anything similar to this on FreeBSD yet? - i.e. Does/can anything on the system tell ZFS "This drives experiencing failures" rather than ZFS just seeing lots of timed out I/O 'errors'? (as appears to be the case). In the end, the failing drive was timing out literally every I/O - I did recover the situation by detaching it from the pool (which hung the machine - probably caused by ZFS having to update the meta-data on all drives, including the failed one). A reboot bought the pool back, minus the 'failed' drive, so enough of the 'detach' must have completed. The newly attached drive completed the re-silver in half an hour (as opposed to an estimated 755 hours and climbing with the other drive still in the pool, limping along). -Kp ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
Do you know if this is documented in Release Notes or Known Issues or somewhere? thanks, B Daniel Eischen wrote: On Thu, 11 Sep 2008, Barry Andrews wrote: Hi All, I have a multi-threaded library that is linked against libpthread. When I load this lib into a tclsh process on FreeBSD, I get this error, "Recurse on private mutex". and crash. I understand that I can have this issue when the executable is not linked against libpthread but one of the loaded libs is. Basically, it thinks it's in single threaded mode. This must be an older version of FreeBSD. I think you must link your application (tclsh or whatever) against libpthread in order for this to work. The libc functions won't get properly overloaded by their equivalents in libpthread unless you do this. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
On Fri, Sep 12, 2008 at 07:41:14AM -0400, Barry Andrews wrote: > Do you know if this is documented in Release Notes or Known Issues or > somewhere? Why would it be an "issue"? gcc -pthread and libpthread linking is documented pretty much everywhere on the web. There isn't anything broken about it, it's how it's done on older FreeBSD. Note that all of this has significantly changed in later FreeBSD versions, and that the 5.x series was deprecated a very long time ago. >> On Thu, 11 Sep 2008, Barry Andrews wrote: >> >>> Hi All, >>> >>> I have a multi-threaded library that is linked against libpthread. >>> When I >>> load this lib into a tclsh process on FreeBSD, I get this error, >>> "Recurse on >>> private mutex". and crash. I understand that I can have this issue >>> when the >>> executable is not linked against libpthread but one of the loaded >>> libs is. >>> Basically, it thinks it's in single threaded mode. >> >> This must be an older version of FreeBSD. I think you must >> link your application (tclsh or whatever) against libpthread >> in order for this to work. The libc functions won't get properly >> overloaded by their equivalents in libpthread unless you do >> this. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On Fri, Sep 12, 2008 at 10:45:24AM +0100, Karl Pielorz wrote: > Recently, a ZFS pool on my FreeBSD box started showing lots of errors on > one drive in a mirrored pair. > > The pool consists of around 14 drives (as 7 mirrored pairs), hung off of > a couple of SuperMicro 8 port SATA controllers (1 drive of each pair is > on each controller). > > One of the drives started picking up a lot of errors (by the end of > things it was returning errors pretty much for any reads/writes issued) - > and taking ages to complete the I/O's. > > However, ZFS kept trying to use the drive - e.g. as I attached another > drive to the remaining 'good' drive in the mirrored pair, ZFS was still > trying to read data off the failed drive (and remaining good one) in > order to complete it's re-silver to the newly attached drive. > > Having posted on the Open Solaris ZFS list - it appears, under Solaris > there's an 'FMA Engine' which communicates drive failures and the like to > ZFS - advising ZFS when a drive should be marked as 'failed'. > > Is there anything similar to this on FreeBSD yet? - i.e. Does/can > anything on the system tell ZFS "This drives experiencing failures" > rather than ZFS just seeing lots of timed out I/O 'errors'? (as appears > to be the case). As far as I know, there is no such "standard" mechanism in FreeBSD. If the drive falls off the bus entirely (e.g. detached), I would hope ZFS would notice that. I can imagine it (might) also depend on if the disk subsystem you're using is utilising CAM or not (e.g. disks should be daX not adX); Scott Long might know if something like this is implemented in CAM. I'm fairly certain nothing like this is implemented in ata(4). Ideally, it would be the job of the controller and controller driver to announce to underlying I/O operations fail/success. Do you agree? I hope this "FMA Engine" on Solaris only *tells* underlying pieces of I/O errors, rather than acting on them (e.g. automatically yanking the disk off the bus for you). I'm in no way shunning Solaris, I'm simply saying such a mechanism could be as risky/deadly as it could be useful. > In the end, the failing drive was timing out literally every I/O - I did > recover the situation by detaching it from the pool (which hung the > machine - probably caused by ZFS having to update the meta-data on all > drives, including the failed one). A reboot bought the pool back, minus > the 'failed' drive, so enough of the 'detach' must have completed. > > The newly attached drive completed the re-silver in half an hour (as > opposed to an estimated 755 hours and climbing with the other drive still > in the pool, limping along). -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
I don't understand. If it was not broken, then why did it change in later FreeBSD versions? On Fri, Sep 12, 2008 at 9:10 AM, Jeremy Chadwick <[EMAIL PROTECTED]> wrote: > On Fri, Sep 12, 2008 at 07:41:14AM -0400, Barry Andrews wrote: > > Do you know if this is documented in Release Notes or Known Issues or > > somewhere? > > Why would it be an "issue"? gcc -pthread and libpthread linking is > documented pretty much everywhere on the web. There isn't anything > broken about it, it's how it's done on older FreeBSD. > > Note that all of this has significantly changed in later FreeBSD > versions, and that the 5.x series was deprecated a very long time ago. > > >> On Thu, 11 Sep 2008, Barry Andrews wrote: > >> > >>> Hi All, > >>> > >>> I have a multi-threaded library that is linked against libpthread. > >>> When I > >>> load this lib into a tclsh process on FreeBSD, I get this error, > >>> "Recurse on > >>> private mutex". and crash. I understand that I can have this issue > >>> when the > >>> executable is not linked against libpthread but one of the loaded > >>> libs is. > >>> Basically, it thinks it's in single threaded mode. > >> > >> This must be an older version of FreeBSD. I think you must > >> link your application (tclsh or whatever) against libpthread > >> in order for this to work. The libc functions won't get properly > >> overloaded by their equivalents in libpthread unless you do > >> this. > > -- > | Jeremy Chadwickjdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
On Fri, Sep 12, 2008 at 09:26:37AM -0400, Barry Andrews wrote: > I don't understand. If it was not broken, then why did it change in later > FreeBSD versions? I should be more explicit: the threading library and implementations have changed over time. There was libc_r, then there was libthr, then there was libkse. This is what we call "evolution". :-) http://www.unobvious.com/bsd/freebsd-threads.html http://kerneltrap.org/node/624 http://www.freebsd.org/kse/ The gcc -pthread flag is still there on present-day FreeBSD (6 through HEAD), and *should* be used. You can choose not to use it but you must ensure during linktime that you explicitly link to -lpthread. > On Fri, Sep 12, 2008 at 9:10 AM, Jeremy Chadwick <[EMAIL PROTECTED]> wrote: > > > On Fri, Sep 12, 2008 at 07:41:14AM -0400, Barry Andrews wrote: > > > Do you know if this is documented in Release Notes or Known Issues or > > > somewhere? > > > > Why would it be an "issue"? gcc -pthread and libpthread linking is > > documented pretty much everywhere on the web. There isn't anything > > broken about it, it's how it's done on older FreeBSD. > > > > Note that all of this has significantly changed in later FreeBSD > > versions, and that the 5.x series was deprecated a very long time ago. > > > > >> On Thu, 11 Sep 2008, Barry Andrews wrote: > > >> > > >>> Hi All, > > >>> > > >>> I have a multi-threaded library that is linked against libpthread. > > >>> When I > > >>> load this lib into a tclsh process on FreeBSD, I get this error, > > >>> "Recurse on > > >>> private mutex". and crash. I understand that I can have this issue > > >>> when the > > >>> executable is not linked against libpthread but one of the loaded > > >>> libs is. > > >>> Basically, it thinks it's in single threaded mode. > > >> > > >> This must be an older version of FreeBSD. I think you must > > >> link your application (tclsh or whatever) against libpthread > > >> in order for this to work. The libc functions won't get properly > > >> overloaded by their equivalents in libpthread unless you do > > >> this. > > > > -- > > | Jeremy Chadwickjdc at parodius.com | > > | Parodius Networking http://www.parodius.com/ | > > | UNIX Systems Administrator Mountain View, CA, USA | > > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > > > > ___ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "[EMAIL PROTECTED]" -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
--On 12 September 2008 06:21 -0700 Jeremy Chadwick <[EMAIL PROTECTED]> wrote: As far as I know, there is no such "standard" mechanism in FreeBSD. If the drive falls off the bus entirely (e.g. detached), I would hope ZFS would notice that. I can imagine it (might) also depend on if the disk subsystem you're using is utilising CAM or not (e.g. disks should be daX not adX); Scott Long might know if something like this is implemented in CAM. I'm fairly certain nothing like this is implemented in ata(4). For ATA, at the moment - I don't think it'll notice even if a drive detaches. I think like my system the other day, it'll just keep issuing I/O commands to the drive, even if it's disappeared (it might get much 'quicker failures' if the device has 'gone' to the point of FreeBSD just quickly returning 'fail' for every request). Ideally, it would be the job of the controller and controller driver to announce to underlying I/O operations fail/success. Do you agree? I hope this "FMA Engine" on Solaris only *tells* underlying pieces of I/O errors, rather than acting on them (e.g. automatically yanking the disk off the bus for you). I'm in no way shunning Solaris, I'm simply saying such a mechanism could be as risky/deadly as it could be useful. Yeah, I guess so - I think the way it's meant to happen (and this is only AFAIK) is that FMA 'detects' a failing drive by applying some configurable policy to it. That policy would also include notifying ZFS, so that ZFS could then decide to stop issuing I/O commands to that device. None of this seems to be in place, at least for ATA under FreeBSD - when a drive goes bad, you can just end up with 'hours' worth of I/O timeouts, until someone intervenes. I did enquire on the Open Solaris list about setting limits for 'errors' in ZFS, which netted me a reply that it's FMA (at least in Solaris) that's responsible for this - it just then informs ZFS of the condition. We don't appear (again at least for ATA) to have anything similar for FreeBSD yet :( -Kp ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
Thanks for the links! But I'm not sure what any of this has to do with this particular issue. I have an exe that does not use threads that loads a lib that is linked with libpthread. Why does different threading implementations affect what I am seeing here? Is there no way for this to work in FreeBSD v5.5? Why would this go away if I upgraded to 6.x or better? thanks, B Jeremy Chadwick wrote: On Fri, Sep 12, 2008 at 09:26:37AM -0400, Barry Andrews wrote: I don't understand. If it was not broken, then why did it change in later FreeBSD versions? I should be more explicit: the threading library and implementations have changed over time. There was libc_r, then there was libthr, then there was libkse. This is what we call "evolution". :-) http://www.unobvious.com/bsd/freebsd-threads.html http://kerneltrap.org/node/624 http://www.freebsd.org/kse/ The gcc -pthread flag is still there on present-day FreeBSD (6 through HEAD), and *should* be used. You can choose not to use it but you must ensure during linktime that you explicitly link to -lpthread. On Fri, Sep 12, 2008 at 9:10 AM, Jeremy Chadwick <[EMAIL PROTECTED]> wrote: On Fri, Sep 12, 2008 at 07:41:14AM -0400, Barry Andrews wrote: Do you know if this is documented in Release Notes or Known Issues or somewhere? Why would it be an "issue"? gcc -pthread and libpthread linking is documented pretty much everywhere on the web. There isn't anything broken about it, it's how it's done on older FreeBSD. Note that all of this has significantly changed in later FreeBSD versions, and that the 5.x series was deprecated a very long time ago. On Thu, 11 Sep 2008, Barry Andrews wrote: Hi All, I have a multi-threaded library that is linked against libpthread. When I load this lib into a tclsh process on FreeBSD, I get this error, "Recurse on private mutex". and crash. I understand that I can have this issue when the executable is not linked against libpthread but one of the loaded libs is. Basically, it thinks it's in single threaded mode. This must be an older version of FreeBSD. I think you must link your application (tclsh or whatever) against libpthread in order for this to work. The libc functions won't get properly overloaded by their equivalents in libpthread unless you do this. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
Karl Pielorz wrote: > Recently, a ZFS pool on my FreeBSD box started showing lots of errors on > one drive in a mirrored pair. > > The pool consists of around 14 drives (as 7 mirrored pairs), hung off of a > couple of SuperMicro 8 port SATA controllers (1 drive of each pair is on > each controller). > > One of the drives started picking up a lot of errors (by the end of things > it was returning errors pretty much for any reads/writes issued) - and > taking ages to complete the I/O's. > > However, ZFS kept trying to use the drive - e.g. as I attached another > drive to the remaining 'good' drive in the mirrored pair, ZFS was still > trying to read data off the failed drive (and remaining good one) in order > to complete it's re-silver to the newly attached drive. > > Having posted on the Open Solaris ZFS list - it appears, under Solaris > there's an 'FMA Engine' which communicates drive failures and the like to > ZFS - advising ZFS when a drive should be marked as 'failed'. > > Is there anything similar to this on FreeBSD yet? - i.e. Does/can anything > on the system tell ZFS "This drives experiencing failures" rather than ZFS > just seeing lots of timed out I/O 'errors'? (as appears to be the case). > > In the end, the failing drive was timing out literally every I/O - I did > recover the situation by detaching it from the pool (which hung the machine > - probably caused by ZFS having to update the meta-data on all drives, > including the failed one). A reboot bought the pool back, minus the > 'failed' drive, so enough of the 'detach' must have completed. Did you try "atacontrol detach" to remove the disk from the bus? I haven't tried that with ZFS, but gmirror automatically detects when a disk has gone away, and doesn't try to do anything with it anymore. It certainly should not hang the machine. After all, what's the purpose of a RAID when you have to reboot upon drive failure. ;-) Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "C++ is over-complicated nonsense. And Bjorn Shoestrap's book a danger to public health. I tried reading it once, I was in recovery for months." -- Cliff Sarginson ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
On Fri, Sep 12, 2008 at 11:00:18AM -0400, Barry Andrews wrote: > Thanks for the links! But I'm not sure what any of this has to do with > this particular issue. I have an exe that does not use threads that > loads a lib that is linked with libpthread. Why does different threading > implementations affect what I am seeing here? Is there no way for this > to work in FreeBSD v5.5? Why would this go away if I upgraded to 6.x or > better? You're confusing me. Earlier you said: >>> I have a multi-threaded library that is linked against libpthread. >>> When I >>> load this lib into a tclsh process on FreeBSD, I get this error, So what is "the exe?" Are you referring to tclsh? If so, you need to rebuild tclsh from source to link with libpthread. If not, you need to contact whoever provided the binary and ask them to rebuild it from source. Additionally, please ensure that the tclsh binary is linked to the same version of libpthread library as your own library. You want to make sure they're both built and linked on the same machine (from the same source code) if possible; the simple ".so.X" versioning method works great for major changes, but there are often minor changes that don't result in "X" being increased. I'm getting the impression that the tclsh binary you have was not built on the same machine / from the same source as what your library (the one linked with libpthread) was. > Jeremy Chadwick wrote: >> On Fri, Sep 12, 2008 at 09:26:37AM -0400, Barry Andrews wrote: >> >>> I don't understand. If it was not broken, then why did it change in later >>> FreeBSD versions? >>> >> >> I should be more explicit: the threading library and implementations >> have changed over time. There was libc_r, then there was libthr, then >> there was libkse. This is what we call "evolution". :-) >> >> http://www.unobvious.com/bsd/freebsd-threads.html >> http://kerneltrap.org/node/624 >> http://www.freebsd.org/kse/ >> >> The gcc -pthread flag is still there on present-day FreeBSD (6 through >> HEAD), and *should* be used. You can choose not to use it but you must >> ensure during linktime that you explicitly link to -lpthread. >> >> >>> On Fri, Sep 12, 2008 at 9:10 AM, Jeremy Chadwick <[EMAIL PROTECTED]> wrote: >>> >>> On Fri, Sep 12, 2008 at 07:41:14AM -0400, Barry Andrews wrote: > Do you know if this is documented in Release Notes or Known Issues or > somewhere? > Why would it be an "issue"? gcc -pthread and libpthread linking is documented pretty much everywhere on the web. There isn't anything broken about it, it's how it's done on older FreeBSD. Note that all of this has significantly changed in later FreeBSD versions, and that the 5.x series was deprecated a very long time ago. >> On Thu, 11 Sep 2008, Barry Andrews wrote: >> >> >>> Hi All, >>> >>> I have a multi-threaded library that is linked against libpthread. >>> When I >>> load this lib into a tclsh process on FreeBSD, I get this error, >>> "Recurse on >>> private mutex". and crash. I understand that I can have this issue >>> when the >>> executable is not linked against libpthread but one of the loaded >>> libs is. >>> Basically, it thinks it's in single threaded mode. >>> >> This must be an older version of FreeBSD. I think you must >> link your application (tclsh or whatever) against libpthread >> in order for this to work. The libc functions won't get properly >> overloaded by their equivalents in libpthread unless you do >> this. >> -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | >>> ___ >>> freebsd-hackers@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >>> To unsubscribe, send any mail to "[EMAIL PROTECTED]" >>> >> >> > > ___ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "[EMAIL PROTECTED]" -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
Yes, the exe is tclsh. I understand that linking tclsh with libpthread is what would work. However this is very impractical. A user of my library shouldn't have to rebuild their tclsh to match my library specs. Another option would be to ship tclsh with my lib, but that also is a little weird. It seems like the only somewhat practical option I have is to use LD_PRELOAD, which is also weird but better than nothing. On Fri, Sep 12, 2008 at 11:45 AM, Jeremy Chadwick <[EMAIL PROTECTED]>wrote: > On Fri, Sep 12, 2008 at 11:00:18AM -0400, Barry Andrews wrote: > > Thanks for the links! But I'm not sure what any of this has to do with > > this particular issue. I have an exe that does not use threads that > > loads a lib that is linked with libpthread. Why does different threading > > implementations affect what I am seeing here? Is there no way for this > > to work in FreeBSD v5.5? Why would this go away if I upgraded to 6.x or > > better? > > You're confusing me. Earlier you said: > > >>> I have a multi-threaded library that is linked against libpthread. > >>> When I > >>> load this lib into a tclsh process on FreeBSD, I get this error, > > So what is "the exe?" Are you referring to tclsh? If so, you need to > rebuild tclsh from source to link with libpthread. If not, you need to > contact whoever provided the binary and ask them to rebuild it from > source. > > Additionally, please ensure that the tclsh binary is linked to the same > version of libpthread library as your own library. You want to make > sure they're both built and linked on the same machine (from the same > source code) if possible; the simple ".so.X" versioning method works > great for major changes, but there are often minor changes that don't > result in "X" being increased. > > I'm getting the impression that the tclsh binary you have was not built > on the same machine / from the same source as what your library (the one > linked with libpthread) was. > > > Jeremy Chadwick wrote: > >> On Fri, Sep 12, 2008 at 09:26:37AM -0400, Barry Andrews wrote: > >> > >>> I don't understand. If it was not broken, then why did it change in > later > >>> FreeBSD versions? > >>> > >> > >> I should be more explicit: the threading library and implementations > >> have changed over time. There was libc_r, then there was libthr, then > >> there was libkse. This is what we call "evolution". :-) > >> > >> http://www.unobvious.com/bsd/freebsd-threads.html > >> http://kerneltrap.org/node/624 > >> http://www.freebsd.org/kse/ > >> > >> The gcc -pthread flag is still there on present-day FreeBSD (6 through > >> HEAD), and *should* be used. You can choose not to use it but you must > >> ensure during linktime that you explicitly link to -lpthread. > >> > >> > >>> On Fri, Sep 12, 2008 at 9:10 AM, Jeremy Chadwick <[EMAIL PROTECTED]> > wrote: > >>> > >>> > On Fri, Sep 12, 2008 at 07:41:14AM -0400, Barry Andrews wrote: > > > Do you know if this is documented in Release Notes or Known Issues or > > somewhere? > > > Why would it be an "issue"? gcc -pthread and libpthread linking is > documented pretty much everywhere on the web. There isn't anything > broken about it, it's how it's done on older FreeBSD. > > Note that all of this has significantly changed in later FreeBSD > versions, and that the 5.x series was deprecated a very long time ago. > > > >> On Thu, 11 Sep 2008, Barry Andrews wrote: > >> > >> > >>> Hi All, > >>> > >>> I have a multi-threaded library that is linked against libpthread. > >>> When I > >>> load this lib into a tclsh process on FreeBSD, I get this error, > >>> "Recurse on > >>> private mutex". and crash. I understand that I can have this issue > >>> when the > >>> executable is not linked against libpthread but one of the loaded > >>> libs is. > >>> Basically, it thinks it's in single threaded mode. > >>> > >> This must be an older version of FreeBSD. I think you must > >> link your application (tclsh or whatever) against libpthread > >> in order for this to work. The libc functions won't get properly > >> overloaded by their equivalents in libpthread unless you do > >> this. > >> > -- > | Jeremy Chadwickjdc at parodius.com| > | Parodius Networking http://www.parodius.com/| > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > > > >>> ___ > >>> freebsd-hackers@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > >>> To unsubscribe, send any mail to " > [EMAIL PROTECTED]" > >>> > >> > >> > > > > ___ > > freebsd-hackers@freebsd.org mailing list > > http://lists.freebsd.org/mai
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On September 12, 2008 02:45 am Karl Pielorz wrote: > Recently, a ZFS pool on my FreeBSD box started showing lots of errors > on one drive in a mirrored pair. > > The pool consists of around 14 drives (as 7 mirrored pairs), hung off > of a couple of SuperMicro 8 port SATA controllers (1 drive of each pair > is on each controller). > > One of the drives started picking up a lot of errors (by the end of > things it was returning errors pretty much for any reads/writes issued) > - and taking ages to complete the I/O's. > > However, ZFS kept trying to use the drive - e.g. as I attached another > drive to the remaining 'good' drive in the mirrored pair, ZFS was still > trying to read data off the failed drive (and remaining good one) in > order to complete it's re-silver to the newly attached drive. For the one time I've had a drive fail, and the three times I've replaced drives for larger ones, the process used was: zpool offline zpool replace For one machine, I had to shut it off after the offline, as it didn't have hot-swappable drive bays. For the other machine, it did everything while online and running. IOW, the old device never had a chance to interfere with anything. Same process we've used with hardware RAID setups in the past. > Is there anything similar to this on FreeBSD yet? - i.e. Does/can > anything on the system tell ZFS "This drives experiencing failures" > rather than ZFS just seeing lots of timed out I/O 'errors'? (as appears > to be the case). Beyond the periodic script that checks for things like this, and sends root an e-mail, I haven't seen anything. -- Freddie Cash [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On Fri, Sep 12, 2008 at 03:34:30PM +0100, Karl Pielorz wrote: > --On 12 September 2008 06:21 -0700 Jeremy Chadwick <[EMAIL PROTECTED]> > wrote: > >> As far as I know, there is no such "standard" mechanism in FreeBSD. If >> the drive falls off the bus entirely (e.g. detached), I would hope ZFS >> would notice that. I can imagine it (might) also depend on if the disk >> subsystem you're using is utilising CAM or not (e.g. disks should be daX >> not adX); Scott Long might know if something like this is implemented in >> CAM. I'm fairly certain nothing like this is implemented in ata(4). > > For ATA, at the moment - I don't think it'll notice even if a drive > detaches. I think like my system the other day, it'll just keep issuing > I/O commands to the drive, even if it's disappeared (it might get much > 'quicker failures' if the device has 'gone' to the point of FreeBSD just > quickly returning 'fail' for every request). I know ATA will notice a detached channel, because I myself have done it: administratively, that is -- atacontrol detach ataX. But the only time that can happen "automatically" is if the actual controller does so itself, or if FreeBSD is told to do it administratively. What this does to other parts of the kernel and userland applications is something I haven't tested. I *can* tell you that there are major, major problems with detach/reattach/reinit on ata(4) causing kernel panics and other such things. I've documented this quite thoroughly in my "Common FreeBSD issues" wiki: http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues I am also very curious to know the exact brand/model of 8-port SATA controller from Supermicro you are using, *especially* if it uses ata(4) rather than CAM and da(4). Such Supermicro controllers were recently discussed on freebsd-stable (or was it -hardware?), and no one was able to come to a concise decision as to whether or not they were decent or even remotely trusted. Supermicro provides a few different SATA HBAs. >> Ideally, it would be the job of the controller and controller driver to >> announce to underlying I/O operations fail/success. Do you agree? >> >> I hope this "FMA Engine" on Solaris only *tells* underlying pieces of >> I/O errors, rather than acting on them (e.g. automatically yanking the >> disk off the bus for you). I'm in no way shunning Solaris, I'm simply >> saying such a mechanism could be as risky/deadly as it could be useful. > > Yeah, I guess so - I think the way it's meant to happen (and this is only > AFAIK) is that FMA 'detects' a failing drive by applying some > configurable policy to it. That policy would also include notifying ZFS, > so that ZFS could then decide to stop issuing I/O commands to that > device. It sounds like that is done very differently than on FreeBSD. If such a condition happens on FreeBSD (disk errors scrolling by, etc.), the only way I know of to get FreeBSD to stop sending commands through the ATA subsystem is to detach the channel (atacontrol detach ataX). > None of this seems to be in place, at least for ATA under FreeBSD - when > a drive goes bad, you can just end up with 'hours' worth of I/O timeouts, > until someone intervenes. I can see the usefulness in Solaris's FMA thing. My big concern is whether or not FMA actually pulls the disk off the channel, or if it just leaves the disk/channel connected and simply informs kernel pieces not to use it. If it pulls the disk off the channel, I have serious qualms with it. There are also chips on SATA and SCSI controllers which can cause chaos as well -- specifically, SES/SES2 chips (I'm looking at you, QLogic). These are supposed to be "smart chips" that detect when there are a large number of transport or hardware errors (implying cabling issues, etc.) and *automatically* yank the disk off the bus. Sounds great on paper, but in the field, I see these chips start pulling disks off the bus, changing SCSI IDs on devices, or induce what appear to be full SCSI subsystem timeouts (e.g. the SES/SES2 chip has locked up/crashed in some way, and now your entire bus is dead in the water). I have seen all of the above bugs with onboard Adaptec 320 controllers, the systems running Solaris 8, 9, and OpenSolaris. Most times it turns out to be the SES/SES2 chip getting in the way. > I did enquire on the Open Solaris list about setting limits for 'errors' > in ZFS, which netted me a reply that it's FMA (at least in Solaris) > that's responsible for this - it just then informs ZFS of the condition. > We don't appear (again at least for ATA) to have anything similar for > FreeBSD yet :( My recommendation to people these days is to avoid ata(4) on FreeBSD at all costs if they expect to encounter disk or hardware failures. The ata(4) layer is in no way shape or form reliable in the case of transport or disk failures, and even sometimes in the case of hot- swapping. Try your hardest to find a physical controller that supports SATA d
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On Fri, Sep 12, 2008 at 11:44 AM, Oliver Fromme <[EMAIL PROTECTED]>wrote: > Did you try "atacontrol detach" to remove the disk from > the bus? I haven't tried that with ZFS, but gmirror > automatically detects when a disk has gone away, and > doesn't try to do anything with it anymore. It certainly > should not hang the machine. After all, what's the > purpose of a RAID when you have to reboot upon drive > failure. ;-) To be fair, many "home" users run RAID without the expectation of being able to hot swap the drives. While RAID can provide high availability, but it can also provide simple data security. In my home environment, I have a number of machines running. I have a few things on non-redundant disks --- mostly operating systems or local archives of internet data (like a cvsup server, for instance). Those disks can be lost, and while it's a nuisance, it's not catastrophic. Other things (from family photos to mp3s to other media) I keep on home RAID arrays. They're not hot swap... but I've had quite a few disks go bad over the years. I actually welcome ZFS for this --- the idea that checksums are kepts makes me feel a lot more secure about my data. I have observed some bitrot over time on some data. To your point... I suppose you have to reboot at some point after the drive failure, but my experience has been that the reboot has been under my control some time after the failure (usually when I have the replacement drive). For the home user, this can be quite inexpensive, too. I've found a case that can take 19 drives internally (and has good cooling for about $125). If you used some of the 5-to-3 drive bays, that number would increase to 25. About the only real improvement I'd like to see in this setup is the ability to spin down idle drives. That would be an ideal setup for the home RAID array. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
On Fri, Sep 12, 2008 at 11:55:01AM -0400, Barry Andrews wrote: > Yes, the exe is tclsh. I understand that linking tclsh with libpthread is > what would work. However this is very impractical. A user of my library > shouldn't have to rebuild their tclsh to match my library specs. Another > option would be to ship tclsh with my lib, but that also is a little weird. > It seems like the only somewhat practical option I have is to use > LD_PRELOAD, which is also weird but better than nothing. This really isn't a "FreeBSD problem", as the same sort of issue plagues other operating systems. When it comes to threading, you want *everything* threaded as much as possible -- mix-matching usually does not work. The only OS I have seen where that kind of environment works reliably is Solaris. I still feel threading is "too new" of a technology on UNIX. Your options as I see them: 1) Require your users to ensure they have a threaded TCL installation, and do not promise support in the case they try to use your library on a non-threaded installation, 1) Provide two versions of your library -- a threaded and non-threaded version. This may be impractical for performance reasons, 3) Require LD_PRELOAD, which is ugly, agreed. I think those are pretty much the only options you have at this point. Not a great set, I know, but it's reality. > On Fri, Sep 12, 2008 at 11:45 AM, Jeremy Chadwick <[EMAIL PROTECTED]>wrote: > > > On Fri, Sep 12, 2008 at 11:00:18AM -0400, Barry Andrews wrote: > > > Thanks for the links! But I'm not sure what any of this has to do with > > > this particular issue. I have an exe that does not use threads that > > > loads a lib that is linked with libpthread. Why does different threading > > > implementations affect what I am seeing here? Is there no way for this > > > to work in FreeBSD v5.5? Why would this go away if I upgraded to 6.x or > > > better? > > > > You're confusing me. Earlier you said: > > > > >>> I have a multi-threaded library that is linked against libpthread. > > >>> When I > > >>> load this lib into a tclsh process on FreeBSD, I get this error, > > > > So what is "the exe?" Are you referring to tclsh? If so, you need to > > rebuild tclsh from source to link with libpthread. If not, you need to > > contact whoever provided the binary and ask them to rebuild it from > > source. > > > > Additionally, please ensure that the tclsh binary is linked to the same > > version of libpthread library as your own library. You want to make > > sure they're both built and linked on the same machine (from the same > > source code) if possible; the simple ".so.X" versioning method works > > great for major changes, but there are often minor changes that don't > > result in "X" being increased. > > > > I'm getting the impression that the tclsh binary you have was not built > > on the same machine / from the same source as what your library (the one > > linked with libpthread) was. > > > > > Jeremy Chadwick wrote: > > >> On Fri, Sep 12, 2008 at 09:26:37AM -0400, Barry Andrews wrote: > > >> > > >>> I don't understand. If it was not broken, then why did it change in > > later > > >>> FreeBSD versions? > > >>> > > >> > > >> I should be more explicit: the threading library and implementations > > >> have changed over time. There was libc_r, then there was libthr, then > > >> there was libkse. This is what we call "evolution". :-) > > >> > > >> http://www.unobvious.com/bsd/freebsd-threads.html > > >> http://kerneltrap.org/node/624 > > >> http://www.freebsd.org/kse/ > > >> > > >> The gcc -pthread flag is still there on present-day FreeBSD (6 through > > >> HEAD), and *should* be used. You can choose not to use it but you must > > >> ensure during linktime that you explicitly link to -lpthread. > > >> > > >> > > >>> On Fri, Sep 12, 2008 at 9:10 AM, Jeremy Chadwick <[EMAIL PROTECTED]> > > wrote: > > >>> > > >>> > > On Fri, Sep 12, 2008 at 07:41:14AM -0400, Barry Andrews wrote: > > > > > Do you know if this is documented in Release Notes or Known Issues or > > > somewhere? > > > > > Why would it be an "issue"? gcc -pthread and libpthread linking is > > documented pretty much everywhere on the web. There isn't anything > > broken about it, it's how it's done on older FreeBSD. > > > > Note that all of this has significantly changed in later FreeBSD > > versions, and that the 5.x series was deprecated a very long time ago. > > > > > > >> On Thu, 11 Sep 2008, Barry Andrews wrote: > > >> > > >> > > >>> Hi All, > > >>> > > >>> I have a multi-threaded library that is linked against libpthread. > > >>> When I > > >>> load this lib into a tclsh process on FreeBSD, I get this error, > > >>> "Recurse on > > >>> private mutex". and crash. I understand that I can have this issue > > >>> when the > > >>> executable is not linked against libpthread but one of the loaded > > >>>
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On Fri, Sep 12, 2008 at 09:04:22AM -0700, Jeremy Chadwick wrote: > What this does to other parts of the kernel and userland applications is > something I haven't tested. I *can* tell you that there are major, > major problems with detach/reattach/reinit on ata(4) causing kernel > panics and other such things. I've documented this quite thoroughly in > my "Common FreeBSD issues" wiki: > > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues This should have read: "... in my ATA/SATA issues and troubleshooting methods page": http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
On Fri, Sep 12, 2008 at 09:09:00AM -0700, Jeremy Chadwick wrote: > On Fri, Sep 12, 2008 at 11:55:01AM -0400, Barry Andrews wrote: > > Yes, the exe is tclsh. I understand that linking tclsh with libpthread is > > what would work. However this is very impractical. A user of my library > > shouldn't have to rebuild their tclsh to match my library specs. Another > > option would be to ship tclsh with my lib, but that also is a little weird. > > It seems like the only somewhat practical option I have is to use > > LD_PRELOAD, which is also weird but better than nothing. > > This really isn't a "FreeBSD problem", as the same sort of issue plagues > other operating systems. When it comes to threading, you want > *everything* threaded as much as possible -- mix-matching usually does > not work. The only OS I have seen where that kind of environment works > reliably is Solaris. I still feel threading is "too new" of a Yeah, they merged libpthread and libc some time ago, AFAIR :). pgpkt3IafyDql.pgp Description: PGP signature
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On Fri, Sep 12, 2008 at 10:34 AM, Karl Pielorz <[EMAIL PROTECTED]>wrote: > --On 12 September 2008 06:21 -0700 Jeremy Chadwick <[EMAIL PROTECTED]> > wrote: > > As far as I know, there is no such "standard" mechanism in FreeBSD. If >> the drive falls off the bus entirely (e.g. detached), I would hope ZFS >> would notice that. I can imagine it (might) also depend on if the disk >> subsystem you're using is utilising CAM or not (e.g. disks should be daX >> not adX); Scott Long might know if something like this is implemented in >> CAM. I'm fairly certain nothing like this is implemented in ata(4). >> > > For ATA, at the moment - I don't think it'll notice even if a drive > detaches. I think like my system the other day, it'll just keep issuing I/O > commands to the drive, even if it's disappeared (it might get much 'quicker > failures' if the device has 'gone' to the point of FreeBSD just quickly > returning 'fail' for every request). Since I had the opportunity, I tested this recently for both CAM and ATA. Now the RAID engine was gmirror in both cases (my production hardware doesn't do ZFS yet), but I expect the reaction to be somewhat the same. Both systems were Dell 1U's. One, an R200, had SATA disks attached to a plain SATA controller. I believe it may have supported RAID1, but I didn't use that functionality. When a drive was removed from it, it stalled for some time (30 minutes?) and then resumed working. by the time I could type on the machine again, gmirror had decided that the drive was gone and marked the mirror as degraded. The other system was a 1950-III with a SCSI SAS controller attached to an SAS hot-swap backplane. The drives themselves were 750G SATA drives. Yanking one of them resulted in about 5 seconds of disruption followed by gmirror realizing the problem and marking the mirror degraded. Neither system was heavily loaded during the test. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On Fri, Sep 12, 2008 at 12:04:27PM -0400, Zaphod Beeblebrox wrote: > On Fri, Sep 12, 2008 at 11:44 AM, Oliver Fromme <[EMAIL PROTECTED]>wrote: > > Did you try "atacontrol detach" to remove the disk from > > the bus? I haven't tried that with ZFS, but gmirror > > automatically detects when a disk has gone away, and > > doesn't try to do anything with it anymore. It certainly > > should not hang the machine. After all, what's the > > purpose of a RAID when you have to reboot upon drive > > failure. ;-) > > To be fair, many "home" users run RAID without the expectation of being able > to hot swap the drives. While RAID can provide high availability, but it > can also provide simple data security. RAID only ensures a very, very tiny part of "data security", and it depends greatly on what RAID implementation you use. No RAID implementation I know of provides against transparent data corruption ("bit-rot"), and many RAID controllers and RAID drivers have bugs that induce corruption (to date, that's (very old ATA) Highpoint chips, nVidia/nForce chips, JMicron or Silicon Image chips -- all of these are used on consumer boards). A big problem is also that end-users *still* think RAID is a replacement for doing backups. :-( > To your point... I suppose you have to reboot at some point after the drive > failure, but my experience has been that the reboot has been under my > control some time after the failure (usually when I have the replacement > drive). For home use, sure. Since most home/consumer systems do not include hot-swappable drive bays, rebooting is required. Although more and more consumer motherboards are offering AHCI -- which is the only reliable way you'll get that capability with SATA. In my case with servers in a co-lo, it's not acceptable. Our systems contain SATA backplanes that support hot-swapping, and it works how it should (yank the disk, replace with a new one) on Linux -- there is no need to do a bunch of hoopla like on FreeBSD. On FreeBSD, with that hoopla, also take the risk of inducing a kernel panic. That risk does not sit well with me, but thankfully I've only been in that situation (replacing a bad disk + using hot-swapping) once -- and it did work. At my home, I have a pseudo-NAS system running FreeBSD. The case is from Supermicro, a mid-tower, and has a SATA backplane that supports hot-swapping. I use ZFS on this system, sporting 3 disks and one (non-ZFS) for boot/OS. But because I'm using ata(4) -- see above. Individuals on -stable and other lists using ZFS have posted their experiences with disk failures. I believe to date I've seen one which worked flawlessly, and the others reporting strange issues with resilvering, or in a couple cases, lost all their zpools permanently. Of course, it's very rare in this day and age for people to mail a mailing list reporting *successes* with something -- people usually only mail if something *fails*. :-) That said, pjd@'s dedication to getting ZFS working reliably on FreeBSD is outstanding. It's a great filesystem replacement, and even the Linux folks are a bit jealous over how simple and painless it is. I can share their jealousy -- I've looked at the LVM docs... never again. > About the only real improvement I'd like to see in this setup is the ability > to spin down idle drives. That would be an ideal setup for the home RAID > array. There is a FreeBSD port which handles this, although such a feature should ideally be part of the ata(4) system (as should TCQ/NCQ and a slew of other things -- some of those are being worked on). -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: loading multi threaded library into executable enabled for single thread
On Fri, 12 Sep 2008, Barry Andrews wrote: Do you know if this is documented in Release Notes or Known Issues or somewhere? No, but it's certainly in the -threads or -ports mailing list archives from a few years ago ;-) -- DE ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On Fri, Sep 12, 2008 at 12:32 PM, Jeremy Chadwick <[EMAIL PROTECTED]>wrote: > On Fri, Sep 12, 2008 at 12:04:27PM -0400, Zaphod Beeblebrox wrote: > > On Fri, Sep 12, 2008 at 11:44 AM, Oliver Fromme <[EMAIL PROTECTED] > >wrote: > > > Did you try "atacontrol detach" to remove the disk from > > > the bus? I haven't tried that with ZFS, but gmirror > > > automatically detects when a disk has gone away, and > > > doesn't try to do anything with it anymore. It certainly > > > should not hang the machine. After all, what's the > > > purpose of a RAID when you have to reboot upon drive > > > failure. ;-) > > > > To be fair, many "home" users run RAID without the expectation of being > able > > to hot swap the drives. While RAID can provide high availability, but it > > can also provide simple data security. > > RAID only ensures a very, very tiny part of "data security", and it > depends greatly on what RAID implementation you use. No RAID > implementation I know of provides against transparent data corruption > ("bit-rot"), and many RAID controllers and RAID drivers have bugs that Well... this is/was a thread about ZFS. ZFS does detect that bitrot _and_ correct it if it is possible. > A big problem is also that end-users *still* think RAID is a replacement > for doing backups Well... this comment seems a bit off topic, but maybe (in some cases) RAID is a substitute for doing backups. I suppose it depends on your tolerance and data value. The sheer size of some datasets these days makes backup prohibitively time consuming and/or expensive. Then again (this is a ZFS thread), ZFS helps with this: the ability to export snapshots to other spinning spool makes a lot of sense. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On September 12, 2008 09:32 am Jeremy Chadwick wrote: > For home use, sure. Since most home/consumer systems do not include > hot-swappable drive bays, rebooting is required. Although more and > more consumer motherboards are offering AHCI -- which is the only > reliable way you'll get that capability with SATA. > > In my case with servers in a co-lo, it's not acceptable. Our systems > contain SATA backplanes that support hot-swapping, and it works how it > should (yank the disk, replace with a new one) on Linux -- there is no > need to do a bunch of hoopla like on FreeBSD. On FreeBSD, with that > hoopla, also take the risk of inducing a kernel panic. That risk does > not sit well with me, but thankfully I've only been in that situation > (replacing a bad disk + using hot-swapping) once -- and it did work. Hrm, is this with software RAID or hardware RAID? With our hardware RAID systems, the process has always been the same, regardless of which OS (Windows 2003 Servers, Debian Linux, FreeBSD) is on the system: - go into RAID management GUI, remove drive - pull dead drive from system - insert new drive into system - go into RAID management GUI, make sure it picked up new drive and started the rebuild We've been lucky so far, and not had to do any drive replacements on our non-ZFS software RAID systems (md on Debian, gmirror on FreeBSD). I'm not looking forward to a drive failing, as these systems have non-hot-pluggable SATA setups. On the ZFS systems, we just "zpool offline" the drive, physically replace the drive, and "zpool replace" the drive. On one system, this was done via hot-pluggable SATA backplane, on another, it required a reboot. -- Freddie Cash [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZFS w/failing drives - any equivalent of Solaris FMA?
On Fri, Sep 12, 2008 at 10:12:09AM -0700, Freddie Cash wrote: > On September 12, 2008 09:32 am Jeremy Chadwick wrote: > > For home use, sure. Since most home/consumer systems do not include > > hot-swappable drive bays, rebooting is required. Although more and > > more consumer motherboards are offering AHCI -- which is the only > > reliable way you'll get that capability with SATA. > > > > In my case with servers in a co-lo, it's not acceptable. Our systems > > contain SATA backplanes that support hot-swapping, and it works how it > > should (yank the disk, replace with a new one) on Linux -- there is no > > need to do a bunch of hoopla like on FreeBSD. On FreeBSD, with that > > hoopla, also take the risk of inducing a kernel panic. That risk does > > not sit well with me, but thankfully I've only been in that situation > > (replacing a bad disk + using hot-swapping) once -- and it did work. > > Hrm, is this with software RAID or hardware RAID? I do not use either, but have tried software RAID (Intel MatrixRAID) in the past (and major, MAJOR bugs are why I do not any longer). Speaking (mostly) strictly of FreeBSD, let me list off the problems with both: Software RAID: 1) Buggy as hell. Using Intel MatrixRAID as an example, even with RAID 1, due to ata(4) driver bugs, you are practically guaranteed to lose your data, 3) Limited userland interface to RAID BIOS; many operations do not work with atacontrol, requiring a system reboot + entering BIOS to do things like add/remove disks or rebuild an array 3) SMART monitoring lost; if the card or BIOS supports passthrough (basically ATA version of pass(4)), FreeBSD will see the disks natively (e.g. arX for the RAID, ad4 and ad8 for the disks), and you can use smartmontools. Otherwise, you're screwed 4) Support is questionable; numerous mainstream chips unsupported, including Adaptec HostRAID Hardware RAID: 1) You are "locked in" to that controller. Your data is at the mercy of the company who makes the HBA; if your controller dies and is no longer made, your data is dead in the water. Chances are a newer model/revision of controller will not understand the the disk metadata from the previous controller 2) Performance problems as a result of excessive caching levels; onboard hardware cache vs. system memory cache vs. disk layer cache in OS vs. other kernel caching mechanisms 3) Controller firmware upgrades are risky -- 3Ware has a very nasty history of this, for sake of example. I've heard of some upgrades changing the metadata format, requiring complete array re-creation I can pull Ade Lovett <[EMAIL PROTECTED]> into this conversation if you think any of the above is exaggerated. :-) The only hardware RAID controller I'd trust at this point would be Areca -- but hardware RAID is not what I want. On the other hand, I really want Areca to make a standard 4 or 8-port SATA controller -- no RAID, but full driver support under arcmsr(4) (which uses CAM and da(4)). This would be perfect. > With our hardware RAID systems, the process has always been the same, > regardless of which OS (Windows 2003 Servers, Debian Linux, FreeBSD) is > on the system: > - go into RAID management GUI, remove drive > - pull dead drive from system > - insert new drive into system > - go into RAID management GUI, make sure it picked up new drive and > started the rebuild The simplicity there is correct -- that's really how simple it should be. But a GUI? What card is this that requires a GUI? Does it require a reboot? No command-line support? > We've been lucky so far, and not had to do any drive replacements on our > non-ZFS software RAID systems (md on Debian, gmirror on FreeBSD). I'm > not looking forward to a drive failing, as these systems have > non-hot-pluggable SATA setups. I'm hearing you loud and clear. :-) > On the ZFS systems, we just "zpool offline" the drive, physically replace > the drive, and "zpool replace" the drive. On one system, this was done > via hot-pluggable SATA backplane, on another, it required a reboot. If this was done on the hardware RAID controller (presuming it uses CAM and da(4)), I'm not surprised it worked perfectly. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Increasing KVM on amd64
By the way, this part of Alan's patch fixes a bug in RELENG7 where mapbase is passed to vm_map_find uninitialized. -CURRENT already has this change applied. Perhaps it's worth committing in RELENG7, too. --- ./kern/link_elf_obj.c.orig 2008-09-01 11:06:44.0 -0700 +++ ./kern/link_elf_obj.c 2008-09-10 13:07:54.793310216 -0700 @@ -667,6 +667,7 @@ goto out; } ef->address = (caddr_t) vm_map_min(kernel_map); + mapbase = KERNBASE; error = vm_map_find(kernel_map, ef->object, 0, &mapbase, round_page(mapsize), TRUE, VM_PROT_ALL, VM_PROT_ALL, FALSE); if (error) { --Artem ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"