Re: Stable SATA pci card for FreeBSD 6.x/7.0
Hi, Just an update on this issue. Quick summary: I fixed the BIOS issues, the hardware monitor issues, and the rl0/rl1 watchdog timeout issues (it seems). However I'm still having problems with my SATA drives (or at least one of them). More info below. BIOS: I flashed my BIOS to the latest version about a year ago, and never noticed that there was any problem, but it turns out there was. I never reset the BIOS to default factory settings after the upgrade, and it seems the settings were corrupt. After having reset the BIOS to the "default optimized factory settings" it stopped crashing when I go into the H/W monitor and also when using healthd -d (output below): Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.14; Volt. = 3.33, 4.97, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 4.97, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54 This also seems to have fixed the rl0 watchdog timeout problems. I no longer see those in my logs. SATA DRIVES: I'm still having problems with the SATA drives. I tried connecting the 1TB Samsung drives to my mainboard, but then the box hangs when booting with the "Detecting IDE drives" message. The regular (PATA) IDE drives are detected first, and then it repeats the "Detecting IDE drives" message to detect the sata drives, and hangs. When I connect my 250GB SATA drives to my mainboard they detect fine, and the box boots normally. I did another rsync of my old mirror (the 250GB disks) to the new mirror (1TB disks), but again one of the disks got detached. This time there are no other messages in the log, the only thing I see is the following: Aug 13 14:35:27 piglet su: sebster to root on /dev/ttyp5 Aug 13 14:55:38 piglet kernel: ad6: FAILURE - device detached Aug 13 14:55:38 piglet kernel: subdisk6: detached Aug 13 14:55:38 piglet kernel: ad6: detached Aug 13 14:55:38 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 disconnected. Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K (unfortunate that the log file just got rotated, but in the new log file there is nothing execpt the one expected line: Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K So, nothing after the disconnect... The questions I have now is: 1) Could an upgrade to FreeBSD 7-STABLE fix the issue (it's a LOT of work for me, but I'll do it if there are SATA driver issues fixed). 2) What is the next step? Should I repeat the tests to see if it's always the same drive that disconnects? 3) Is there any way to get more info about what is causing the disconnect? Regards, Sebastiaan Jeremy Chadwick wrote: On Wed, Aug 06, 2008 at 02:57:48AM -0700, Jeremy Chadwick wrote: vmstat -i output should help clear that up, or dmesg output. Sebastiaan has included vmstat -i output in another part of this thread, as well as dmesg output for the ATA disks and controllers: atapci0: port 0xd200-0xd207,0xd300-0xd303,0xd400-0xd407,0xd500-0xd503,0xd600-0xd60f mem 0xf6081000-0xf60811ff irq 18 at device 10.0 on pci0 ata2: on atapci0 ata3: on atapci0 atapci1: port 0xd700-0xd707,0xd800-0xd803,0xd900-0xd907,0xda00-0xda03,0xdb00-0xdb0f,0xdc00-0xdcff irq 20 at device 15.0 on pci0 ata4: on atapci1 ata5: on atapci1 atapci2: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xdd00-0xdd0f at device 15.1 on pci0 ata0: on atapci2 ata1: on atapci2 ad0: 286188MB at ata0-master UDMA133 ad1: 239372MB at ata0-slave UDMA133 acd0: DVDR at ata1-master UDMA33 ad4: 953869MB at ata2-master SATA150 ad6: 953869MB at ata3-master SATA150 ad8: 239372MB at ata4-master SATA150 ad10: 239372MB at ata5-master SATA150 interrupt total rate irq6: fdc010 0 irq14: ata0 645057 7 irq15: ata1 58 0 irq16: rl0 7168276 82 irq17: rl1914667 10 irq18: atapci0 30072876347 irq20: atapci1 1126099 12 irq21: uhci0 uhci* 308 0 irq23: vr0 3265771 37 cpu0: timer173289011 1999 Total 216482133 2498 Here's a breakdown, so no one gets confused: ad0 = 300GB Maxtor disk, attached to on-board VIA IDE controller ad1 = 250GB Maxtor disk, attached to on-board VIA IDE controller ad4 = 1TB Samsung disk, attached to Silicon Image SATA controller ad6 = 1TB Samsung disk, attached to Silicon Image SATA controller ad8 = 250GB Maxtor disk, attached to on-board VI
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Thanks Jonathan, I'm starting to expect it has to be the controller as well. About 20 minutes after I posted this message yesterday (and thus 20 minutes after ad6 got disconnected - atacontrol list showed "no device present" for it) the machine crashed while writing to the remaining ad4 drive (kernel panic). I attached the logs below. I also ran the long smart self test on both drives, and no errors were found on either drive (logs also attached). Unfortunately I could not attach the new disks to my mainboard SATA because my mainboard SATA somehow hangs trying to detect them. So I cannot test if *not* using the controller is going to solve the problems, though I'm it seems logical at the moment it has to be the controller, especially if other people have had similar issues. I guess I'll be buying another controller. Regards, Sebastiaan Jonathan Groll wrote: On Wed, Aug 13, 2008 at 03:10:56PM +0200, Sebastiaan van Erk wrote: Hi, Just an update on this issue. Quick summary: I fixed the BIOS issues, the hardware monitor issues, and the rl0/rl1 watchdog timeout issues (it seems). However I'm still having problems with my SATA drives (or at least one of them). More info below. BIOS: I flashed my BIOS to the latest version about a year ago, and never noticed that there was any problem, but it turns out there was. I never reset the BIOS to default factory settings after the upgrade, and it seems the settings were corrupt. After having reset the BIOS to the "default optimized factory settings" it stopped crashing when I go into the H/W monitor and also when using healthd -d (output below): Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.14; Volt. = 3.33, 4.97, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 4.97, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54 This also seems to have fixed the rl0 watchdog timeout problems. I no longer see those in my logs. SATA DRIVES: I'm still having problems with the SATA drives. I tried connecting the 1TB Samsung drives to my mainboard, but then the box hangs when booting with the "Detecting IDE drives" message. The regular (PATA) IDE drives are detected first, and then it repeats the "Detecting IDE drives" message to detect the sata drives, and hangs. When I connect my 250GB SATA drives to my mainboard they detect fine, and the box boots normally. I did another rsync of my old mirror (the 250GB disks) to the new mirror (1TB disks), but again one of the disks got detached. This time there are no other messages in the log, the only thing I see is the following: Aug 13 14:35:27 piglet su: sebster to root on /dev/ttyp5 Aug 13 14:55:38 piglet kernel: ad6: FAILURE - device detached Aug 13 14:55:38 piglet kernel: subdisk6: detached Aug 13 14:55:38 piglet kernel: ad6: detached Aug 13 14:55:38 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 disconnected. Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K (unfortunate that the log file just got rotated, but in the new log file there is nothing execpt the one expected line: Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K So, nothing after the disconnect... The questions I have now is: 1) Could an upgrade to FreeBSD 7-STABLE fix the issue (it's a LOT of work for me, but I'll do it if there are SATA driver issues fixed). I suspect the problem may be the SiI driver in Freebsd. As a reference point, I've had a similar problem, even on 7-STABLE, but with sparc64 hardware (see earlier post in this thread). It'll probably be simplest for you to just buy another controller of another brand. On the other hand, it'll be worth knowing exactly what is wrong with the SiI driver... Cheers, Jonathan Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K Aug 13 15:11:26 piglet su: sebster to root on /dev/ttyp4 Aug 13 15:34:55 piglet kernel: mirror/gm1s1e[WRITE(offset=875450693632, length=2048)]error = 6 Aug 13 15:34:55 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=875450695680, length=2048)]error = 6 [snip 335750 similar lines] Aug 13 15:36:30 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=875450931200, length=2048)]error = 6 Aug 13 15:36:30 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=875450933248, length=2048)]error = 6 Aug 13 15:36:30 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=875450935296, length=2048)]error = 6 Aug 13 15:36:30 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=875450937
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Hi, Cian Hughes wrote: > Sebastiaan, > Have you tried connecting your 250GB drives to the troublesome > controller? If so, does "stressing" them cause the system to panic? > > ~Cian Hughes Thanks for you reply. I have not tried stress-testing the 250GB drives on the troublesome controller. The problem with those drives is, that even though they are mirrored, the data is very important to me and I do not want it to get corrupted. I do have backups of course, but the problem with data corruption is that it often takes very long to notice... I was thinking of buying the Promise SATA300 TX4 PCI Controller. I've searched on google, and I do see some negative posts on them in combination with FreeBSD, however they all date back at least 2 years... Does anybody have positive/negative experiences using this card? Regards, Sebastiaan -- University of Bristol Medical School On 14 Aug 2008, at 10:37, Sebastiaan van Erk wrote: Thanks Jonathan, I'm starting to expect it has to be the controller as well. About 20 minutes after I posted this message yesterday (and thus 20 minutes after ad6 got disconnected - atacontrol list showed "no device present" for it) the machine crashed while writing to the remaining ad4 drive (kernel panic). I attached the logs below. I also ran the long smart self test on both drives, and no errors were found on either drive (logs also attached). Unfortunately I could not attach the new disks to my mainboard SATA because my mainboard SATA somehow hangs trying to detect them. So I cannot test if *not* using the controller is going to solve the problems, though I'm it seems logical at the moment it has to be the controller, especially if other people have had similar issues. I guess I'll be buying another controller. Regards, Sebastiaan Jonathan Groll wrote: On Wed, Aug 13, 2008 at 03:10:56PM +0200, Sebastiaan van Erk wrote: Hi, Just an update on this issue. Quick summary: I fixed the BIOS issues, the hardware monitor issues, and the rl0/rl1 watchdog timeout issues (it seems). However I'm still having problems with my SATA drives (or at least one of them). More info below. BIOS: I flashed my BIOS to the latest version about a year ago, and never noticed that there was any problem, but it turns out there was. I never reset the BIOS to default factory settings after the upgrade, and it seems the settings were corrupt. After having reset the BIOS to the "default optimized factory settings" it stopped crashing when I go into the H/W monitor and also when using healthd -d (output below): Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.14; Volt. = 3.33, 4.97, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 4.97, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54 Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54 This also seems to have fixed the rl0 watchdog timeout problems. I no longer see those in my logs. SATA DRIVES: I'm still having problems with the SATA drives. I tried connecting the 1TB Samsung drives to my mainboard, but then the box hangs when booting with the "Detecting IDE drives" message. The regular (PATA) IDE drives are detected first, and then it repeats the "Detecting IDE drives" message to detect the sata drives, and hangs. When I connect my 250GB SATA drives to my mainboard they detect fine, and the box boots normally. I did another rsync of my old mirror (the 250GB disks) to the new mirror (1TB disks), but again one of the disks got detached. This time there are no other messages in the log, the only thing I see is the following: Aug 13 14:35:27 piglet su: sebster to root on /dev/ttyp5 Aug 13 14:55:38 piglet kernel: ad6: FAILURE - device detached Aug 13 14:55:38 piglet kernel: subdisk6: detached Aug 13 14:55:38 piglet kernel: ad6: detached Aug 13 14:55:38 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 disconnected. Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K (unfortunate that the log file just got rotated, but in the new log file there is nothing execpt the one expected line: Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K So, nothing after the disconnect... The questions I have now is: 1) Could an upgrade to FreeBSD 7-STABLE fix the issue (it's a LOT of work for me, but I'll do it if there are SATA driver issues fixed). I suspect the problem may be the SiI driver in Freebsd. As a reference point, I've had a similar problem, even on 7-STABLE, but with sparc64 hardware (see earlier post in this thread). I
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Hi everybody, Thanks for all the help I got trying to figure this one out. I bought the Promise SATA300 TX4 PCI controller and everything is working smoothly now. This means I now have the other controller left over: [pciconf -lv output] [EMAIL PROTECTED]:10:0: class=0x018000 card=0x35121095 chip=0x35121095 rev=0x01 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' device = 'Sil 3512 SATALink/SATARaid Controller' class = mass storage I would like to donate it to FreeBSD developers working on the drivers for these cards if they want/need it (where do I need to send it)... Otherwise I'll just sell it on our local version of ebay. Regards and thanks again, Sebastiaan Jeremy Chadwick wrote: On Thu, Aug 21, 2008 at 09:49:25AM +0200, Sebastiaan van Erk wrote: I was thinking of buying the Promise SATA300 TX4 PCI Controller. I've searched on google, and I do see some negative posts on them in combination with FreeBSD, however they all date back at least 2 years... Does anybody have positive/negative experiences using this card? I have one of these cards (not currently in use; less stuff inside my FreeBSD box at home the better), and never ran into any oddities. That was with 4 disks connected, each disk its own UFS2 filesystem. ZFS wasn't available back then. smime.p7s Description: S/MIME Cryptographic Signature
Stable SATA pci card for FreeBSD 6.x/7.0
Hi, I'm running FreeBSD 6.3 (I know, I should upgrade), and I just bought an add-on pci SATA controller for 2 extra SATA disks. However, a lot of disk activity on the drives will often cause the machine to crash and spontaneously reboot. I checked out which chipset was on the card with pciconf -lv and I found it was the Sil 3512. Googling showed me that I'm not the only one with problems using this card. Does anybody have experience with a (preferably not too expensive) 2-port SATA expansion card which does not have any issues running under FreeBSD 6.3/7.0? [pciconf -lv output] [EMAIL PROTECTED]:10:0: class=0x018000 card=0x35121095 chip=0x35121095 rev=0x01 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' device = 'Sil 3512 SATALink/SATARaid Controller' class = mass storage [/var/log/messages before the crash] Aug 5 11:16:14 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)]error = 6 Aug 5 11:16:17 piglet last message repeated 9 times Regards, Sebastiaan smime.p7s Description: S/MIME Cryptographic Signature
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Hi, Thanks for the reply. Jeremy Chadwick wrote: Yes, most of the Silicon Image ICs I've read about have odd driver problems or general issues (even under Windows). The system rebooting is an odd one; you sure your PSU can handle two disks? Well, I've got a 450W Asus PSU in there, but I've also got 6 hard disks and 1 dvd-rom drive (mostly inactive) in there. The hard disks are mostly 250/300GB but the two new ones are 1TB SATA drives. But the 450W should easily be enough, shouldn't it? Does anybody have experience with a (preferably not too expensive) 2-port SATA expansion card which does not have any issues running under FreeBSD 6.3/7.0? Promise makes some consumer-priced cards which work very well under FreeBSD (sos@ has full documentation on their cards). > Their RAID controllers (the consumer-level ones) **do not** require that you use RAID; they support JBOD, and the disks will show up under FreeBSD as ad(4) devices. (If you choose to use the RAID, you'll still see the ad(4) disks, but you'll also see an ar(4) device too. This has the added advantage of you being able to monitor SMART stats on the disks themselves directly, etc... I'll have a look at that if I can't get this one stable. They're reasonably priced, so if they're good with FreeBSD then that looks like a good option to me. [pciconf -lv output] [EMAIL PROTECTED]:10:0: class=0x018000 card=0x35121095 chip=0x35121095 rev=0x01 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' device = 'Sil 3512 SATALink/SATARaid Controller' class = mass storage [/var/log/messages before the crash] Aug 5 11:16:14 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] error = 6 Aug 5 11:16:17 piglet last message repeated 9 times Are you sure this is being caused by the controller? Have you checked SMART statistics on both disks? Assuming error == errno, errno 6 is "Device not configured". I did look at the smart stats [pasted them below]. What I will try next is just to switch the two 250GB SATA drives on my main board with the two 1TB drives on the controller and see if I still get the problems if I really increase the load on the two 1TB drives. There's been recent discussion of such messages being caused by the use of gmirror or gjournal, when the mirror/journal is improperly set up. (In one users' case, he was receiving similar errors, as well as the filesystem failing during fsck. Turns out he incorrectly configured journalling, which nuked the last ~1MB of his UFS filesystem.) I'm not saying this is the reason for the messages you see, but it's something to keep in mind. I'll try reconfigure the geom. I used an online tutorial, but I'm not quite sure that I did everything correctly, though fsck worked alright. I did do this one differently than usual though, usually I use full disk mirror after I already initialized one of the disks, and then I convert it to a mirror by using: sysctl kern.geom.debugflags=16 gmirror label -v -b round-robin gm0 /dev/ad0 gmirror insert gm0 /dev/ad2 (Especially useful when you want the entire FreeBSD install to be mirrored). I guess I can try this on the extra disks as well. Regards, Sebastiaan smime.p7s Description: S/MIME Cryptographic Signature
Re: Stable SATA pci card for FreeBSD 6.x/7.0
196 Reallocated_Event_Count 0x0032 100 100 000Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x003e 100 100 000Old_age Always - 0 200 Multi_Zone_Error_Rate 0x000a 100 100 000Old_age Always - 0 201 Soft_Read_Error_Rate0x000a 253 253 000Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 0 Warning: ATA Specification requires self-test log structure revision number = 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1 SMART Selective self-test log data structure revision number 0 Warning: ATA Specification requires selective self-test log data structure revision number = 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 100 Not_testing 200 Not_testing 300 Not_testing 400 Not_testing 500 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. [EMAIL PROTECTED](ttyp3:60:0):~# Jeremy Chadwick wrote: On Tue, Aug 05, 2008 at 12:28:40PM +0200, Sebastiaan van Erk wrote: However, a lot of disk activity on the drives will often cause the machine to crash and spontaneously reboot. I checked out which chipset was on the card with pciconf -lv and I found it was the Sil 3512. Googling showed me that I'm not the only one with problems using this card. Yes, most of the Silicon Image ICs I've read about have odd driver problems or general issues (even under Windows). The system rebooting is an odd one; you sure your PSU can handle two disks? Does anybody have experience with a (preferably not too expensive) 2-port SATA expansion card which does not have any issues running under FreeBSD 6.3/7.0? Promise makes some consumer-priced cards which work very well under FreeBSD (sos@ has full documentation on their cards). Their RAID controllers (the consumer-level ones) **do not** require that you use RAID; they support JBOD, and the disks will show up under FreeBSD as ad(4) devices. (If you choose to use the RAID, you'll still see the ad(4) disks, but you'll also see an ar(4) device too. This has the added advantage of you being able to monitor SMART stats on the disks themselves directly, etc... [pciconf -lv output] [EMAIL PROTECTED]:10:0: class=0x018000 card=0x35121095 chip=0x35121095 rev=0x01 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' device = 'Sil 3512 SATALink/SATARaid Controller' class = mass storage [/var/log/messages before the crash] Aug 5 11:16:14 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] error = 6 Aug 5 11:16:17 piglet last message repeated 9 times Are you sure this is being caused by the controller? Have you checked SMART statistics on both disks? Assuming error == errno, errno 6 is "Device not configured". There's been recent discussion of such messages being caused by the use of gmirror or gjournal, when the mirror/journal is improperly set up. (In one users' case, he was receiving similar errors, as well as the filesystem failing during fsck. Turns out he incorrectly configured journalling, which nuked the last ~1MB of his UFS filesystem.) I'm not saying this is the reason for the messages you see, but it's something to keep in mind. smime.p7s Description: S/MIME Cryptographic Signature
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Hi, Sorry about that, I believe I only messed up on my first reply, and I thought I mailed that to the list as well after I noticed I messed up. Thing is, I'm used to replying to mailing lists using the "Reply" button and unfortunately the reply doesn't go to the mailing list when I do that... Some people don't like it when you send it to the mailing list and CC it to them personally, but since you apparently do, I'll just use reply-all from now on. Sorry again about the mistake, Regards, Sebastiaan Jeremy Chadwick wrote: On Tue, Aug 05, 2008 at 03:16:41PM +0200, Sebastiaan van Erk wrote: Sorry for forgetting to paste the smart details. Pressed send too quickly. A note for the list: Sebastiaan and I are discussing the details off-list. I don't know if he forgot to CC the list on his replies, or if he intentionally sent them to me directly. :-) Just thought I'd make note of that here, in case readers wonder what becomes of this issue. smime.p7s Description: S/MIME Cryptographic Signature
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Jeremy Chadwick wrote: First and foremost, you've forgotten to CC the mailing list on all but one of your replies. I'll assume this is intentional, but it's probably not for the best, as readers may find your post and wonder what the outcome was. It was not intentional, it hit reply instead of reply-all. Sorry. I will reply this to the list, so other interested parties can follow the thread and your informative replies. On Tue, Aug 05, 2008 at 02:47:45PM +0200, Sebastiaan van Erk wrote: Hi, Thanks for the reply. Jeremy Chadwick wrote: Yes, most of the Silicon Image ICs I've read about have odd driver problems or general issues (even under Windows). The system rebooting is an odd one; you sure your PSU can handle two disks? Well, I've got a 450W Asus PSU in there, but I've also got 6 hard disks and 1 dvd-rom drive (mostly inactive) in there. The hard disks are mostly 250/300GB but the two new ones are 1TB SATA drives. But the 450W should easily be enough, shouldn't it? Without getting into semantics, a 450W PSU may be on the light side for 6 disks. I'm fairly amazed you're able to power up that machine without disk errors or other problems during POST. You'll be having 6 disks spin up all simultaneously -- and spin-up is when disks draw the most power, and possibly during normal operation. If you have a different (or larger) PSU, I would recommend trying that to see if it addresses your problem. A PSU which isn't providing enough power will cause the disks to occasionally disconnect from the bus, or the machine sporadtically lock up, reboot (power-cycle), or other odd things. Unfortunately I don't have a larger PSU lying around, but I could buy one; though I'd like to try some other stuff first because I've had 6 disks in my PC before without any problems. [/var/log/messages before the crash] Aug 5 11:16:14 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] error = 6 Aug 5 11:16:17 piglet last message repeated 9 times Are you sure this is being caused by the controller? Have you checked SMART statistics on both disks? Assuming error == errno, errno 6 is "Device not configured". I did look at the smart stats [pasted them below]. What I will try next is just to switch the two 250GB SATA drives on my main board with the two 1TB drives on the controller and see if I still get the problems if I really increase the load on the two 1TB drives. More and more information about your system configuration is coming to light. Your original post didn't disclose any of that; now I know you have 6 disks in the system, 2 of which are using on-board SATA (no idea what controller), and 2 which are using a Silicon Image controller. What are the remaining 2 disks connected to? Sorry that I didn't give you that information immediately. The problem when you do that though is that the post is sometimes ignored because it is deemed too long or complicated (at least I've seen that happen). I'll glady post any relevant data. My other (on-board) SATA controller is a VIA controller; and I've never had any problems with it (although the hardware raid messed up once a year or 2 ago, and since then I've been using software raid without any issues). [EMAIL PROTECTED]:15:0: class=0x010400 card=0x71421462 chip=0x31491106 rev=0x80 hdr=0x00 vendor = 'VIA Technologies Inc' device = 'VT8237 VT6410 SATA RAID Controller' class = mass storage subclass = RAID The remaining disks are PATA disks which are in the on-board IDE controller. It's a legacy computer that's been upgraded a lot, though it's not too obsolete, the CPU's a AMD Sempron(tm) Processor 2600+ (1599.83-MHz 686-class CPU). Your recommended method of troubleshooting (swapping the 250G for the 1TB) is a good idea. But hear me loud and clear: just because you switch the disks and the problem disappears for a few hours doesn't mean it's gone. There have been **many** people who have shown up on the mailing lists stating "I did and now it works!", only to find that a week later it *didn't* fix the problem. Yes, I don't really expect it to solve the problem, but was thinking that at least I could try and stress test the known working disks on the controller and try to see if it's the controller that's the problem or the disks (or something else). I've been able to reproduce the crashes pretty well by just doing a lot of disk IO on the 1TB disks only (so the other disks were pretty idle during the tests). There's been recent discussion of such messages being caused by the use of gmirror or gjournal, when the mirror/journal is improperly set up. (In one users' case, he was receiving similar errors, as well as the filesystem failing dur
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Hi, Yes, good thing you pointed this out, I hadn't seen those yet: Aug 5 09:52:53 piglet ntpd[860]: kernel time sync enabled 2001 Aug 5 11:15:05 piglet kernel: rl1: watchdog timeout Aug 5 11:15:05 piglet kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=218885455 Aug 5 11:15:05 piglet kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=218885455 Aug 5 11:15:10 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: ad6: FAILURE - device detached Aug 5 11:15:31 piglet kernel: subdisk6: detached Aug 5 11:15:31 piglet kernel: ad6: detached Aug 5 11:15:31 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: ad4: FAILURE - device detached Aug 5 11:15:31 piglet kernel: subdisk4: detached Aug 5 11:15:31 piglet kernel: ad4: detached Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 disconnected. Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad4 disconnected. Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider mirror/gm1 destroyed. Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1 destroyed. Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)]error = 6 Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=112069312512, length=131072)]error = 6 Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=112069443584, length=131072)]error = 6 Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=112069574656, length=131072)]error = 6 Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=112069705728, length=131072)]error = 6 Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=112069836800, length=131072)]error = 6 Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=112069967872, length=131072)]error = 6 Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376121856, length=2048)]error = 6 Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)]error = 6 Aug 5 11:15:35 piglet last message repeated 13 times Regards, Sebastiaan Andrey V. Elsukov wrote: Sebastiaan van Erk wrote: [/var/log/messages before the crash] Aug 5 11:16:14 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)]error = 6 Aug 5 11:16:17 piglet last message repeated 9 times Can you show which messages where before these? smime.p7s Description: S/MIME Cryptographic Signature
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Hi, Thanks again for the detailed reply! See the very bottom of my mail. I don't believe the PSU is the problem, after reviewing your SMART statistics. Ok, I'll stick to the one I have then, for now. My other (on-board) SATA controller is a VIA controller; and I've never had any problems with it (although the hardware raid messed up once a year or 2 ago, and since then I've been using software raid without any issues). Okay, so you've got an onboard VIA (VT6410) SATA controller, an onboard VIA IDE controller, and a PCI SATA controller. I'd still like to know which disks are attached to what controller, and if any of the devices are sharing IRQs. Can you provide the output from the following two commands? dmesg | egrep 'atapci|(ad|ata)[0-9]+' vmstat -i I'm just trying to narrow stuff down. Allright, attached is the output to both of these commands. It's interesting that the disks which are giving you trouble are Samsung disks. There's some history here which you should be made aware of: In July, Daniel Eriksson reported data corruption occurring with his nVidia MCP55 chipset when 1TB Samsung disks were attached to it. The same disks on another controller performed fine. The corruption was being detected by ZFS as checksum errors. (UFS/UFS2 won't detect this sort of thing, unless the corruption is occurring somewhere within the filesystem tables.) http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043427.html Soren Schmidt (ata(4) author) replied that there are some nVidia chipset-related fixes for ATA in -CURRENT, and provided a patch. Daniel reported that the patch made absolutely no difference: http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043434.html Daniel also tried using a firmware patch for his Samsung disks, which limit the SATA speed to SATA150, but the speed was still negotiated as SATA300 (indicating the vendors' own f/w patch is broken, or FreeBSD does not play well with it). The f/w patch didn't fix his problem either: http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043432.html [EMAIL PROTECTED] reported using his MCP55 controller without any problem -- as long as he didn't use Samsung disks. He stated that he believes Samsung disks are PATA disks that use a PATA-to-SATA adapter inside of the drive, leading to problems (and yes, those adapters are known to cause all sorts of mayhem): http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043485.html I'm not sure what became of the thread; Daniel never provided a post-mortem. I'm left to believe he probably took [EMAIL PROTECTED]'s advice and switched to another disk vendor. Gee, I that's a whole list. Before today I didn't know that there was that much difference between disk vendors (especially in terms of compatibility). I'll keep that in mind when I buy new disks. Thing is I've had a bunch of disks (Maxtor, Seagate, Western Digitals, Samsung, etc), but I've had bad experiences with both Seagate and Western Digital. (Basically, I've never had a Seagate last me more than 2 years (laptop drives), and I had a raid5 array of WD's of which 3 crashed within 2 years). Never had much trouble with Maxtor or Samsung yet, but obviously take this all with a grain of salt, because 10 disks don't make solid statistics. Thanks for upgrading to 5.38. All the SMART statistics for these disks look okay. No problem, thanks for looking into this in so much detail! Can you run some SMART tests on the disks? You can run these tests while the disks are in use (but I/O will make the test take longer to complete): smartctl -t short /dev/ad4 smartctl -t short /dev/ad6 Then you'll need to look at the SMART self test log, as well as the SMART error log, to see if anything is returned. Make sure the tests have completed (the Status field should be "Completed without error", unless an error was found of course): smartctl -a /dev/ad4 smartctl -a /dev/ad6 I attached the output below, the tests passed. But I thought I'd reply that you know I'm on it. Currently I'm running the offline tests, but they will take another 3 hours at least to complete. Will get you the output of those as soon as they're done. If nothing is found, try a different test (also safe to run during operation; don't let the word "offline" scare you), and repeat looking at the logs once more. This test may take some time, though: smartctl -t offline /dev/ad4 smartctl -t offline /dev/ad6 At this point, I'm inclined to believe the issue is specific to those Samsung disks. I do not believe your PSU is a problem; the SMART statistics would be showing a higher number of power-cycles if the disks were losing power. Worth noting (about Samsung disks) is that smartctl has options to work around 3 different firmware bugs. The bugs are SMART statistics-related, but those kind of mistakes don't give me "warm fuzzies". Be wary. :-) Nope, that definitely does not give great confidence. I still hav
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Hi, Ok, those rl1: watchdog timeouts didn't ring a bell with me because I'd seen them before; however a quick grep in the logs (which date back to May 25) show no other watchdog timeout matches. To try and avoid being incomplete again, I'll just attach the full dmesg below. Jeremy Chadwick wrote: On Wed, Aug 06, 2008 at 11:37:16AM +0200, Sebastiaan van Erk wrote: Yes, good thing you pointed this out, I hadn't seen those yet: Aug 5 11:15:05 piglet kernel: rl1: watchdog timeout Aug 5 11:15:05 piglet kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=218885455 Aug 5 11:15:05 piglet kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=218885455 Aug 5 11:15:10 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: ad6: FAILURE - device detached Aug 5 11:15:31 piglet kernel: subdisk6: detached Aug 5 11:15:31 piglet kernel: ad6: detached Aug 5 11:15:31 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: ad4: FAILURE - device detached Aug 5 11:15:31 piglet kernel: subdisk4: detached Aug 5 11:15:31 piglet kernel: ad4: detached Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 disconnected. Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad4 disconnected. Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider mirror/gm1 destroyed. Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1 destroyed. Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] error = 6 Kudos to Andrey for asking a simple yet incredibly benefitial question. You have a much greater problem here, and it doesn't look specific to your disks. It looks as if an interrupt is stalled or locked. I'm willing to bet your rl1 Realtek NIC and your ATA controller (associated with disks ad4 and ad6) use the same IRQ. vmstat -i output should help clear that up, or dmesg output. I'll tell you that there have been some watchdog timeout fixes committed to rl(4) in recent months, depending upon what specific model and revision of Realtek NIC you have. No offence intended, but Realtek is definitely the worst of the bunch. I'm willing to bet it's an on-board NIC too. :-) Actually, I have 3 NICs in my PC (all of them in use). My machine is the server/router in my home network, so it has the onboard vr0 NIC connected to my ADSL modem, the rl0 nic connected to my internal wired lan, and the rl1 nic connected to my wireless router (my internal wired lan is firewalled from the wireless, since I don't really trust wireless security ;-)). I'm CC'ing PYUN Yong-Hyeon here, as he presently maintains/works on the rl(4) driver, and might be able to help determine if the Realtek NIC is what's causing all of this, or if the ATA chipset (is this the VIA? We don't know yet) is causing it first. Finally, what motherboard brand and model is this, and what BIOS revision or version? I attached the output of dmidecode (and dmesg), hopefully that contains all you need to know. BTW: I did a reply all, but I'm not sure if that is the "right" policy here. If I'm bothering anybody with this and they prefer to only see the mail on the list, then please let me know! Regards and thanks for all the help, Sebastiaan Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.3-PRERELEASE #20: Wed Jan 2 19:48:49 CET 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PIGLET MPTable: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Sempron(tm) Processor 2600+ (1599.83-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x20fc2 Stepping = 2 Features=0x78bfbff Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x1 real memory = 1056964608 (1008 MB) avail memory = 1020919808 (973 MB) ioapic0: Assuming intbase of 0 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Jan 2 2008 19:48:29) cpu0 on motherboard pcib0: pcibus 0 on motherboard pci0: on pcib0 agp0: mem 0xe800-0xefff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) rl0: port 0xd000-0xd0ff mem 0xf6084000-0xf60840ff irq 16 at device 8.0 on pci0 miibus0: on rl0 rlphy0: on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl0: Ethernet address: 00:50:fc:57:a2:4b rl1: port 0xd100-0xd1ff mem 0xf608-0xf60800ff irq 17 at device 9.0 on pci0 miibus1: on rl1 rlphy1: on miibus1 rlphy1: 10baseT, 10baseT-FDX
Re: Stable SATA pci card for FreeBSD 6.x/7.0
Bummer, I forgot the dmidecode output. Sorry about that. :-( Regards, Sebastiaan Sebastiaan van Erk wrote: Hi, Ok, those rl1: watchdog timeouts didn't ring a bell with me because I'd seen them before; however a quick grep in the logs (which date back to May 25) show no other watchdog timeout matches. To try and avoid being incomplete again, I'll just attach the full dmesg below. Jeremy Chadwick wrote: On Wed, Aug 06, 2008 at 11:37:16AM +0200, Sebastiaan van Erk wrote: Yes, good thing you pointed this out, I hadn't seen those yet: Aug 5 11:15:05 piglet kernel: rl1: watchdog timeout Aug 5 11:15:05 piglet kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=218885455 Aug 5 11:15:05 piglet kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=218885455 Aug 5 11:15:10 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: ad6: FAILURE - device detached Aug 5 11:15:31 piglet kernel: subdisk6: detached Aug 5 11:15:31 piglet kernel: ad6: detached Aug 5 11:15:31 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: rl1: watchdog timeout Aug 5 11:15:31 piglet kernel: ad4: FAILURE - device detached Aug 5 11:15:31 piglet kernel: subdisk4: detached Aug 5 11:15:31 piglet kernel: ad4: detached Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 disconnected. Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad4 disconnected. Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider mirror/gm1 destroyed. Aug 5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1 destroyed. Aug 5 11:15:31 piglet kernel: g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] error = 6 Kudos to Andrey for asking a simple yet incredibly benefitial question. You have a much greater problem here, and it doesn't look specific to your disks. It looks as if an interrupt is stalled or locked. I'm willing to bet your rl1 Realtek NIC and your ATA controller (associated with disks ad4 and ad6) use the same IRQ. vmstat -i output should help clear that up, or dmesg output. I'll tell you that there have been some watchdog timeout fixes committed to rl(4) in recent months, depending upon what specific model and revision of Realtek NIC you have. No offence intended, but Realtek is definitely the worst of the bunch. I'm willing to bet it's an on-board NIC too. :-) Actually, I have 3 NICs in my PC (all of them in use). My machine is the server/router in my home network, so it has the onboard vr0 NIC connected to my ADSL modem, the rl0 nic connected to my internal wired lan, and the rl1 nic connected to my wireless router (my internal wired lan is firewalled from the wireless, since I don't really trust wireless security ;-)). I'm CC'ing PYUN Yong-Hyeon here, as he presently maintains/works on the rl(4) driver, and might be able to help determine if the Realtek NIC is what's causing all of this, or if the ATA chipset (is this the VIA? We don't know yet) is causing it first. Finally, what motherboard brand and model is this, and what BIOS revision or version? I attached the output of dmidecode (and dmesg), hopefully that contains all you need to know. BTW: I did a reply all, but I'm not sure if that is the "right" policy here. If I'm bothering anybody with this and they prefer to only see the mail on the list, then please let me know! Regards and thanks for all the help, Sebastiaan # dmidecode 2.9 SMBIOS 2.3 present. 33 structures occupying 996 bytes. Table at 0x000F0800. Handle 0x, DMI type 0, 20 bytes BIOS Information Vendor: Phoenix Technologies, LTD Version: 6.00 PG Release Date: 06/27/2006 Address: 0xE Runtime Size: 128 kB ROM Size: 512 kB Characteristics: ISA is supported PCI is supported PNP is supported APM is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported 5.25"/360 KB floppy services are supported (int 13h) 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 KB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported
Sound skipping problems
Hi, I have major sound skipping problems on FreeBSD 6.0. I checked the mailing list archives and found a related thread: http://lists.freebsd.org/pipermail/freebsd-current/2005-June/051103.html To quote Jeff Roberson: > I have a patch that should greatly improve the sound skipping problems > people have under heavy io load. Several people sent me traces that > showed the buf daemon running for hundreds of milliseconds with Giant > held, which can hold up the pcm code. The patch is available at: > > http://www.chesapeake.net/~jroberson/flushbuf.diff The problems are definately correlated to io load, however I can't say that I have HEAVY io loads. A simple: # sync;sync;sync; will already cause the sound to skip. I have DMA enabled on all drives, and it seems the above patch is already merged into FreeBSD 6.0-STABLE. This leaves me at a loss, and I don't know what else to try... Does anybody have any ideas of what I could do to solve this problem? Thanks in advance, Sebastiaan van Erk [EMAIL PROTECTED](ttyp9:92:0):~# uname -a FreeBSD piglet.sebster.com 6.0-STABLE FreeBSD 6.0-STABLE #1: Sat Nov 5 23:42:18 CET 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PIGLET i386 [EMAIL PROTECTED](ttyp9:93:0):~# cat /dev/sndstat FreeBSD Audio Driver (newpcm) Installed devices: pcm0: at io 0xec00 irq 22 kld snd_via8233 (5p/1r/0v channels duplex default) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Sound skipping problems
Hi, Thanks for the tip, this seems to do the trick. Tested it with /usr/ports/sysutils/stress: [EMAIL PROTECTED](ttyp7:42:0):/shared# stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --hdd 4 --timeout 10m stress: info: [2439] dispatching hogs: 8 cpu, 4 io, 2 vm, 4 hdd and heard no more skips. Greetings, Sebastiaan van Erk Markus Trippelsdorf wrote: On Sun, Nov 06, 2005 at 12:39:02PM +0100, Sebastiaan van Erk wrote: Hi, I have major sound skipping problems on FreeBSD 6.0. I checked the mailing list archives and found a related thread: ... Does anybody have any ideas of what I could do to solve this problem? You could try to set hint.pcm.0.buffersize="16384" in /boot/loader.conf . It solved the problem for me. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Sound skipping problems
Hi, Thank you for your reply! I tried the patch, but unfortunately when I reboot with the patch (which cleanly applies and compiles), audio stops working. The device (pcm0) is still there, the mixer is set ok, and everything looks normal, just no sound comes out of the speakers. I have no idea why the patch doesn't work, but if you want any more information I'll be happy to supply it to you. The sound skipping seems at least fixed by just increasing the buffer size, but don't know how reliable this workaround is compared to a structural workaround. Furthermore I don't know if this message is relevant, but it seems the snd_8233 driver doesn't like my audio codec very much: pcm0: port 0xec00-0xecff irq 22 at device 17.5 on pci0 pcm0: [GIANT-LOCKED] pcm0: Greetings, Sebastiaan van Erk Ariff Abdullah wrote: On Sun, 06 Nov 2005 12:39:02 +0100 Sebastiaan van Erk <[EMAIL PROTECTED]> wrote: Hi, I have major sound skipping problems on FreeBSD 6.0. I checked the mailing list archives and found a related thread: http://lists.freebsd.org/pipermail/freebsd-current/2005-June/051103.html To quote Jeff Roberson: > I have a patch that should greatly improve the sound skipping > problems people have under heavy io load. Several people sent me > traces that showed the buf daemon running for hundreds of > milliseconds with Giant held, which can hold up the pcm code. > The patch is available at: > > http://www.chesapeake.net/~jroberson/flushbuf.diff The problems are definately correlated to io load, however I can't say that I have HEAVY io loads. A simple: # sync;sync;sync; will already cause the sound to skip. I have DMA enabled on all drives, and it seems the above patch is already merged into FreeBSD 6.0-STABLE. This leaves me at a loss, and I don't know what else to try... Does anybody have any ideas of what I could do to solve this problem? Recompile your kernel with "options PREEMPTION", and apply this patch: http://people.freebsd.org/~ariff/snd_RELENG_6_0_20051030_058.diff -- Ariff Abdullah MyBSD http://www.MyBSD.org.my (IPv6/IPv4) http://staff.MyBSD.org.my (IPv6/IPv4) http://tomoyo.MyBSD.org.my (IPv6/IPv4) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: DHCP client error: domain_not_set.invalid
I had this as well. It means that your DHCP server returns an invalid search domain. The easy way to solve it (if you have access) is to set the search domain to something valid in your DHCP server (Linksys router by any chance?). I couldn't find a flag on dhclient to tell it to ignore invalid search domains: this would be really handy so that you can connect to badly set up networks when you don't have access to the router. Greetings, Sebastiaan Mark Space wrote: Hi all, I just set up the latest 6.0 release, and I'm getting errors with the DHCP client. Trying to pull a network address during start up, I get: Bogus domain search list 15: domain_not_set.invalid This repeats several times before giving up. Google tells me that this problem was report by two users on the bsd-current list. No one ever replied to their inquiries (at least on the list), so I thought to try once more to see if there's any interest in addressing this issue. More info was in the original post: http://lists.freebsd.org/pipermail/freebsd-current/2005-October/057034.html ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Bug in netgraph?
Hi, There seems to be a bug/problem with GRE (netgraph) in FreeBSD in dealing with fragmented packets. When I have the following nat rules: List of active MAP/Redirect filters: map ng0 10.0.0.0/8 -> 80.126.244.3/32 portmap tcp/udp 4:5 mssclamp 60 map ng0 10.0.0.0/8 -> 80.126.244.3/32 mssclamp 60 everything works, but when I don't include the mssclamp option then connects to for example www.google.com (searching for test) from my internal network hang and timeout constantly. I'm using FreeBSD 6.0 stable in combination with mpd and ipfilter 4.1.18: IP Filter: v4.1.8 initialized. Default = block all, Logging = enabled [EMAIL PROTECTED](ttyp8:16:64):~> mpd --version Version 3.18 ([EMAIL PROTECTED] 22:28 5-Nov-2005) [EMAIL PROTECTED](ttyp8:12:0):~> uname -a FreeBSD piglet.sebster.com 6.0-STABLE FreeBSD 6.0-STABLE #12: Wed Nov 16 13:34:20 CET 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PIGLET i386 Greetings, Sebastiaan van Erk ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: DHCP client error: domain_not_set.invalid
Hi, I understand the idea that bad values should be rejected, but in reality, I have the same DSL modem that these others have and there is no way to change the domain search list that it sends. No way that I could find at least. This is SBC-Yahoo in California, so there are a lot of people out there with this modem. Well ring your ISP and complain. Too many people just accept crappy service. This is just the attitude that's going to get people to use other software. People are going to laugh at you trying to get a network connection and joke "it works fine with Windows". Then you try and explain that it's not your OS's fault and somebody messed up some setting somewhere else. And then they laugh some more watching you struggle. Furthermore it's really not realistic to expect that ISP's are going to do anything about it either. They have a billion other more important issues other than solving that insignificant problem that "that guy who is using an unsupported OS" has. They really don't care. dhcpd should either 1. accept bogus names (warnings are fine) 2. offer a configuration option or command line switch to allow the bogus domain if we wish 3. offer a configuration option like isc-dhcpd does so that we can ignore or override the setting I would have to agree here. I think option 2 is great, because it gets people to be aware of the problem, but it allows them to workaround it if necessary. I really think it's terrible to have the software just reject a lease because of an invalid search domain, without you being able to fix it without hacking code. That's going a bit overboard IMHO and is just going to cause more problems than it's going to solve. Greetings, Sebastiaan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: DHCP client error: domain_not_set.invalid
Hi, Mark Andrews wrote: This is just the attitude that's going to get people to use other software. People are going to laugh at you trying to get a network connection and joke "it works fine with Windows". Then you try and explain that it's not your OS's fault and somebody messed up some setting somewhere else. And then they laugh some more watching you struggle. Actually it is reasonable. Windows lets users violate RFC's in many ways. Yes, it might be reasonable, but it is still going to stop you from being able to connect to a network, whereas Windows users have no problems. RFC 952 specifies what is legal in a hostname. While one can theoretically search for things other than hosts the only real use of the search strings today is for hostnames and/or mail domains (which are syntactically indentical to hostnames). What would be really interesting to know is what they expect the customers to find using this suffix. Actually the entire search domain thing is pretty useless in most cases for home users (unless they have their own internal network, in which case they have their own DNS and DHCP servers). People navigate the internet using fully qualified domain names and it is almost never necessary to have a search domain; it just slows things down having it search for hostnotfound.domain.com.mysearchdomain.com. My bet is that this really is just a configuration error on their part. Could be, or more probably it's just the default setting of the modem. I've had one of these modems, and it took me forever to find the proper setting because in the web interface of the modem they obviously didn't feel like calling it search domain; I can't remember what they called it but they annotated it with the comment "necessary for some ISPs", which just completely wrong-footed me. I'm not fall into an endless discussion so I'm going to wrap it up, but I think it would be really nice if the FreeBSD user could solve this problem themselves instead of having to rely on other people that may not be inclined to put much priority on the issue. And by that I mean a solution other than hacking the code, which is quite much to ask of a regular user. An option to ignore the setting would be just fine, an option to override it even better. I don't know if you can even disable the search domain (haven't read the RFC) but this would be even better in many cases, avoiding queries that are not necessary. Greetings, Seb* ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: DHCP client error: domain_not_set.invalid
Hi, Greg Barniskis wrote: Mark Andrews wrote: Yes it is reasonable to expect ISP to fix things like this. You pay the ISP to operate there part of the network within the operational contraints of the RFCs (Standards track and BCP). I totally agree. Make sure when calling tech support on things like this that you are *not* asking them to provide FreeBSD support, that you can handle that angle of the connection quite well, thanks. Explain that the evidence shows that their system appears to violate global connectivity standards (if you can name which RFC and exactly how it's violated, great, but don't expect first tier help desk phone operators to understand that as it is probably way, way beyond their troubleshooting script). I think this would all be reasonable in a perfect world. In the real world you're paying the operator to get internet access and they often list which operating systems they support (and they don't list FreeBSD). They're going to ask you what operating system are you running, then ask you if your connection works; and when you say it works under windows but violates an ``RFC'' they're just not going to give it much priority. Then when the help desk staff goes "uhm...", politely ask to be escalated to second tier and clearly and politely state your case there, again making it clear that you are *not* asking for FreeBSD support, but support by them of global connectivity standards that every ISP ought to be respecting. At least you have a chance of getting your trouble ticket marked something like "Unresolved -- Bug" instead of "Resolved -- Unsupported OS". That is to say, the kind of ticket that self-escalates to engineers and managers somewhere away from the help desk proper. The word chance says it all. And all the time you're hoping for this chance to become reality you cannot use your broadband connection. Furthermore there are two other problems with this approach: 1) it often costs you a lot of money (even though it can be argued that it is reasonable that ISPs fix real problems free of charge and not charge you an arm and a leg for it, in the real world the situation is often not so perfect). 2) it often costs you a lot of time; it's going to be really hard to even get your request escalated to second tier, and it's definately going to take days and mulitple calls before they start to take you seriously. In the end, it's the FreeBSD user that suffers. Greetings, Sebastiaan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"