Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-13 Thread Sebastiaan van Erk

Hi,

Just an update on this issue.

Quick summary: I fixed the BIOS issues, the hardware monitor issues, and 
the rl0/rl1 watchdog timeout issues (it seems). However I'm still having 
problems with my SATA drives (or at least one of them). More info below.


BIOS:
I flashed my BIOS to the latest version about a year ago, and never 
noticed that there was any problem, but it turns out there was. I never 
reset the BIOS to default factory settings after the upgrade, and it 
seems the settings were corrupt. After having reset the BIOS to the 
"default optimized factory settings" it stopped crashing when I go into 
the H/W monitor and also when using healthd -d (output below):


Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.14; Volt. = 3.33, 4.97,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.12; Volt. = 3.34, 4.97,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54

This also seems to have fixed the rl0 watchdog timeout problems. I no 
longer see those in my logs.


SATA DRIVES:

I'm still having problems with the SATA drives.

I tried connecting the 1TB Samsung drives to my mainboard, but then the 
box hangs when booting with the "Detecting IDE drives" message. The 
regular (PATA) IDE drives are detected first, and then it repeats the 
"Detecting IDE drives" message to detect the sata drives, and hangs. 
When I connect my 250GB SATA drives to my mainboard they detect fine, 
and the box boots normally.


I did another rsync of my old mirror (the 250GB disks) to the new mirror 
(1TB disks), but again one of the disks got detached. This time there 
are no other messages in the log, the only thing I see is the following:


Aug 13 14:35:27 piglet su: sebster to root on /dev/ttyp5
Aug 13 14:55:38 piglet kernel: ad6: FAILURE - device detached
Aug 13 14:55:38 piglet kernel: subdisk6: detached
Aug 13 14:55:38 piglet kernel: ad6: detached
Aug 13 14:55:38 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 
disconnected.

Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K

(unfortunate that the log file just got rotated, but in the new log file 
there is nothing execpt the one expected line:


Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K

So, nothing after the disconnect...

The questions I have now is:
1) Could an upgrade to FreeBSD 7-STABLE fix the issue (it's a LOT of 
work for me, but I'll do it if there are SATA driver issues fixed).
2) What is the next step? Should I repeat the tests to see if it's 
always the same drive that disconnects?

3) Is there any way to get more info about what is causing the disconnect?

Regards,
Sebastiaan

Jeremy Chadwick wrote:

On Wed, Aug 06, 2008 at 02:57:48AM -0700, Jeremy Chadwick wrote:

vmstat -i output should help clear that up, or dmesg output.


Sebastiaan has included vmstat -i output in another part of this thread,
as well as dmesg output for the ATA disks and controllers:

atapci0:  port 
0xd200-0xd207,0xd300-0xd303,0xd400-0xd407,0xd500-0xd503,0xd600-0xd60f mem 
0xf6081000-0xf60811ff irq 18 at device 10.0 on pci0
ata2:  on atapci0
ata3:  on atapci0
atapci1:  port 
0xd700-0xd707,0xd800-0xd803,0xd900-0xd907,0xda00-0xda03,0xdb00-0xdb0f,0xdc00-0xdcff 
irq 20 at device 15.0 on pci0
ata4:  on atapci1
ata5:  on atapci1
atapci2:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xdd00-0xdd0f at device 15.1 on pci0
ata0:  on atapci2
ata1:  on atapci2
ad0: 286188MB  at ata0-master UDMA133
ad1: 239372MB  at ata0-slave UDMA133
acd0: DVDR  at ata1-master UDMA33
ad4: 953869MB  at ata2-master SATA150
ad6: 953869MB  at ata3-master SATA150
ad8: 239372MB  at ata4-master SATA150
ad10: 239372MB  at ata5-master SATA150

interrupt  total   rate
irq6: fdc010  0
irq14: ata0   645057  7
irq15: ata1   58  0
irq16: rl0   7168276 82
irq17: rl1914667 10
irq18: atapci0  30072876347
irq20: atapci1   1126099 12
irq21: uhci0 uhci*   308  0
irq23: vr0   3265771 37
cpu0: timer173289011   1999
Total  216482133   2498

Here's a breakdown, so no one gets confused:

ad0  = 300GB Maxtor disk, attached to on-board VIA IDE controller
ad1  = 250GB Maxtor disk, attached to on-board VIA IDE controller
ad4  = 1TB Samsung disk, attached to Silicon Image SATA controller
ad6  = 1TB Samsung disk, attached to Silicon Image SATA controller
ad8  = 250GB Maxtor disk, attached to on-board VI

Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-14 Thread Sebastiaan van Erk

Thanks Jonathan,

I'm starting to expect it has to be the controller as well. About 20 
minutes after I posted this message yesterday (and thus 20 minutes after 
ad6 got disconnected - atacontrol list showed "no device present" for 
it) the machine crashed while writing to the remaining ad4 drive (kernel 
panic). I attached the logs below. I also ran the long smart self test 
on both drives, and no errors were found on either drive (logs also 
attached).


Unfortunately I could not attach the new disks to my mainboard SATA 
because my mainboard SATA somehow hangs trying to detect them. So I 
cannot test if *not* using the controller is going to solve the 
problems, though I'm it seems logical at the moment it has to be the 
controller, especially if other people have had similar issues.


I guess I'll be buying another controller.

Regards,
Sebastiaan

Jonathan Groll wrote:

On Wed, Aug 13, 2008 at 03:10:56PM +0200, Sebastiaan van Erk wrote:

Hi,

Just an update on this issue.

Quick summary: I fixed the BIOS issues, the hardware monitor issues, and 
the rl0/rl1 watchdog timeout issues (it seems). However I'm still having 
problems with my SATA drives (or at least one of them). More info below.


BIOS:
I flashed my BIOS to the latest version about a year ago, and never 
noticed that there was any problem, but it turns out there was. I never 
reset the BIOS to default factory settings after the upgrade, and it 
seems the settings were corrupt. After having reset the BIOS to the 
"default optimized factory settings" it stopped crashing when I go into 
the H/W monitor and also when using healthd -d (output below):


Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.14; Volt. = 3.33, 4.97,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.12; Volt. = 3.34, 4.97,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
 Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54

This also seems to have fixed the rl0 watchdog timeout problems. I no 
longer see those in my logs.


SATA DRIVES:

I'm still having problems with the SATA drives.

I tried connecting the 1TB Samsung drives to my mainboard, but then the 
box hangs when booting with the "Detecting IDE drives" message. The 
regular (PATA) IDE drives are detected first, and then it repeats the 
"Detecting IDE drives" message to detect the sata drives, and hangs. 
When I connect my 250GB SATA drives to my mainboard they detect fine, 
and the box boots normally.


I did another rsync of my old mirror (the 250GB disks) to the new mirror 
(1TB disks), but again one of the disks got detached. This time there 
are no other messages in the log, the only thing I see is the following:


Aug 13 14:35:27 piglet su: sebster to root on /dev/ttyp5
Aug 13 14:55:38 piglet kernel: ad6: FAILURE - device detached
Aug 13 14:55:38 piglet kernel: subdisk6: detached
Aug 13 14:55:38 piglet kernel: ad6: detached
Aug 13 14:55:38 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 
disconnected.

Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K

(unfortunate that the log file just got rotated, but in the new log file 
there is nothing execpt the one expected line:


Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K

So, nothing after the disconnect...

The questions I have now is:
1) Could an upgrade to FreeBSD 7-STABLE fix the issue (it's a LOT of 
work for me, but I'll do it if there are SATA driver issues fixed).


I suspect the problem may be the SiI driver in Freebsd. As a reference
point, I've had a similar problem, even on 7-STABLE, but with sparc64
hardware (see earlier post in this thread).

It'll probably be simplest for you to just buy another controller of
another brand. On the other hand, it'll be worth knowing exactly what
is wrong with the SiI driver...

Cheers,
Jonathan
Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to size>100K
Aug 13 15:11:26 piglet su: sebster to root on /dev/ttyp4
Aug 13 15:34:55 piglet kernel: mirror/gm1s1e[WRITE(offset=875450693632, 
length=2048)]error = 6
Aug 13 15:34:55 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=875450695680, length=2048)]error = 6

[snip 335750 similar lines]

Aug 13 15:36:30 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=875450931200, length=2048)]error = 6
Aug 13 15:36:30 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=875450933248, length=2048)]error = 6
Aug 13 15:36:30 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=875450935296, length=2048)]error = 6
Aug 13 15:36:30 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=875450937

Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-21 Thread Sebastiaan van Erk

Hi,

Cian Hughes wrote:
> Sebastiaan,
> Have you tried connecting your 250GB drives to the troublesome
> controller? If so, does "stressing" them cause the system to panic?
>
> ~Cian Hughes

Thanks for you reply.

I have not tried stress-testing the 250GB drives on the troublesome 
controller. The problem with those drives is, that even though they are 
mirrored, the data is very important to me and I do not want it to get 
corrupted. I do have backups of course, but the problem with data 
corruption is that it often takes very long to notice...


I was thinking of buying the Promise SATA300 TX4 PCI Controller. I've 
searched on google, and I do see some negative posts on them in 
combination with FreeBSD, however they all date back at least 2 years...


Does anybody have positive/negative experiences using this card?

Regards,
Sebastiaan



--
University of Bristol Medical School

On 14 Aug 2008, at 10:37, Sebastiaan van Erk wrote:


Thanks Jonathan,

I'm starting to expect it has to be the controller as well. About 20 
minutes after I posted this message yesterday (and thus 20 minutes 
after ad6 got disconnected - atacontrol list showed "no device 
present" for it) the machine crashed while writing to the remaining 
ad4 drive (kernel panic). I attached the logs below. I also ran the 
long smart self test on both drives, and no errors were found on 
either drive (logs also attached).


Unfortunately I could not attach the new disks to my mainboard SATA 
because my mainboard SATA somehow hangs trying to detect them. So I 
cannot test if *not* using the controller is going to solve the 
problems, though I'm it seems logical at the moment it has to be the 
controller, especially if other people have had similar issues.


I guess I'll be buying another controller.

Regards,
Sebastiaan

Jonathan Groll wrote:

On Wed, Aug 13, 2008 at 03:10:56PM +0200, Sebastiaan van Erk wrote:

Hi,

Just an update on this issue.

Quick summary: I fixed the BIOS issues, the hardware monitor issues, 
and the rl0/rl1 watchdog timeout issues (it seems). However I'm 
still having problems with my SATA drives (or at least one of them). 
More info below.


BIOS:
I flashed my BIOS to the latest version about a year ago, and never 
noticed that there was any problem, but it turns out there was. I 
never reset the BIOS to default factory settings after the upgrade, 
and it seems the settings were corrupt. After having reset the BIOS 
to the "default optimized factory settings" it stopped crashing when 
I go into the H/W monitor and also when using healthd -d (output below):


Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
Vcore = 1.44, 3.14; Volt. = 3.33, 4.97,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
Vcore = 1.44, 3.12; Volt. = 3.34, 4.97,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54
Temp.= 40.0, 36.0, 66.0; Rot.=0,0,0
Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54

This also seems to have fixed the rl0 watchdog timeout problems. I 
no longer see those in my logs.


SATA DRIVES:

I'm still having problems with the SATA drives.

I tried connecting the 1TB Samsung drives to my mainboard, but then 
the box hangs when booting with the "Detecting IDE drives" message. 
The regular (PATA) IDE drives are detected first, and then it 
repeats the "Detecting IDE drives" message to detect the sata 
drives, and hangs. When I connect my 250GB SATA drives to my 
mainboard they detect fine, and the box boots normally.


I did another rsync of my old mirror (the 250GB disks) to the new 
mirror (1TB disks), but again one of the disks got detached. This 
time there are no other messages in the log, the only thing I see is 
the following:


Aug 13 14:35:27 piglet su: sebster to root on /dev/ttyp5
Aug 13 14:55:38 piglet kernel: ad6: FAILURE - device detached
Aug 13 14:55:38 piglet kernel: subdisk6: detached
Aug 13 14:55:38 piglet kernel: ad6: detached
Aug 13 14:55:38 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 
disconnected.
Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to 
size>100K


(unfortunate that the log file just got rotated, but in the new log 
file there is nothing execpt the one expected line:


Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to 
size>100K


So, nothing after the disconnect...

The questions I have now is:
1) Could an upgrade to FreeBSD 7-STABLE fix the issue (it's a LOT of 
work for me, but I'll do it if there are SATA driver issues fixed).

I suspect the problem may be the SiI driver in Freebsd. As a reference
point, I've had a similar problem, even on 7-STABLE, but with sparc64
hardware (see earlier post in this thread).
I

Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-25 Thread Sebastiaan van Erk

Hi everybody,

Thanks for all the help I got trying to figure this one out. I bought 
the Promise SATA300 TX4 PCI controller and everything is working 
smoothly now.


This means I now have the other controller left over:

[pciconf -lv output]
[EMAIL PROTECTED]:10:0:  class=0x018000 card=0x35121095 chip=0x35121095
rev=0x01
hdr=0x00
vendor = 'Silicon Image Inc (Was: CMD Technology Inc)'
device = 'Sil 3512 SATALink/SATARaid Controller'
class  = mass storage

I would like to donate it to FreeBSD developers working on the drivers 
for these cards if they want/need it (where do I need to send it)... 
Otherwise I'll just sell it on our local version of ebay.


Regards and thanks again,
Sebastiaan

Jeremy Chadwick wrote:

On Thu, Aug 21, 2008 at 09:49:25AM +0200, Sebastiaan van Erk wrote:
I was thinking of buying the Promise SATA300 TX4 PCI Controller. I've  
searched on google, and I do see some negative posts on them in  
combination with FreeBSD, however they all date back at least 2 years...


Does anybody have positive/negative experiences using this card?


I have one of these cards (not currently in use; less stuff inside my
FreeBSD box at home the better), and never ran into any oddities.  That
was with 4 disks connected, each disk its own UFS2 filesystem.  ZFS
wasn't available back then.



smime.p7s
Description: S/MIME Cryptographic Signature


Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-05 Thread Sebastiaan van Erk

Hi,

I'm running FreeBSD 6.3 (I know, I should upgrade), and I just bought an 
add-on pci SATA controller for 2 extra SATA disks.


However, a lot of disk activity on the drives will often cause the 
machine to crash and spontaneously reboot. I checked out which chipset 
was on the card with pciconf -lv and I found it was the Sil 3512. 
Googling showed me that I'm not the only one with problems using this card.


Does anybody have experience with a (preferably not too expensive) 
2-port SATA expansion card which does not have any issues running under 
FreeBSD 6.3/7.0?


[pciconf -lv output]
[EMAIL PROTECTED]:10:0:  class=0x018000 card=0x35121095 chip=0x35121095 
rev=0x01

hdr=0x00
vendor = 'Silicon Image Inc (Was: CMD Technology Inc)'
device = 'Sil 3512 SATALink/SATARaid Controller'
class  = mass storage

[/var/log/messages before the crash]
Aug  5 11:16:14 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, 
length=16384)]error = 6

Aug  5 11:16:17 piglet last message repeated 9 times

Regards,
Sebastiaan


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-05 Thread Sebastiaan van Erk

Hi,

Thanks for the reply.

Jeremy Chadwick wrote:

Yes, most of the Silicon Image ICs I've read about have odd driver
problems or general issues (even under Windows).  The system rebooting
is an odd one; you sure your PSU can handle two disks?


Well, I've got a 450W Asus PSU in there, but I've also got 6 hard disks
and 1 dvd-rom drive (mostly inactive) in there. The hard disks are
mostly 250/300GB but the two new ones are 1TB SATA drives. But the 450W
should easily be enough, shouldn't it?

Does anybody have experience with a (preferably not too expensive)  
2-port SATA expansion card which does not have any issues running under  
FreeBSD 6.3/7.0?


Promise makes some consumer-priced cards which work very well under
FreeBSD (sos@ has full documentation on their cards).

 >

Their RAID controllers (the consumer-level ones) **do not** require that
you use RAID; they support JBOD, and the disks will show up under
FreeBSD as ad(4) devices.  (If you choose to use the RAID, you'll still
see the ad(4) disks, but you'll also see an ar(4) device too.  This has
the added advantage of you being able to monitor SMART stats on the
disks themselves directly, etc...


I'll have a look at that if I can't get this one stable. They're
reasonably priced, so if they're good with FreeBSD then that looks like
a good option to me.


[pciconf -lv output]
[EMAIL PROTECTED]:10:0:  class=0x018000 card=0x35121095 chip=0x35121095  
rev=0x01

hdr=0x00
vendor = 'Silicon Image Inc (Was: CMD Technology Inc)'
device = 'Sil 3512 SATALink/SATARaid Controller'
class  = mass storage

[/var/log/messages before the crash]
Aug  5 11:16:14 piglet kernel:  
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] error = 6

Aug  5 11:16:17 piglet last message repeated 9 times


Are you sure this is being caused by the controller?  Have you checked
SMART statistics on both disks?  Assuming error == errno, errno 6 is
"Device not configured".


I did look at the smart stats [pasted them below]. What I will try next
is just to switch the two 250GB SATA drives on my main board with the
two 1TB drives on the controller and see if I still get the problems if
I really increase the load on the two 1TB drives.


There's been recent discussion of such messages being caused by the use
of gmirror or gjournal, when the mirror/journal is improperly set up.
(In one users' case, he was receiving similar errors, as well as the
filesystem failing during fsck.  Turns out he incorrectly configured
journalling, which nuked the last ~1MB of his UFS filesystem.)

I'm not saying this is the reason for the messages you see, but it's
something to keep in mind.


I'll try reconfigure the geom. I used an online tutorial, but I'm not
quite sure that I did everything correctly, though fsck worked alright.
I did do this one differently than usual though, usually I use full disk
mirror after I already initialized one of the disks, and then I convert
it to a mirror by using:

sysctl kern.geom.debugflags=16
gmirror label -v -b round-robin gm0 /dev/ad0
gmirror insert gm0 /dev/ad2

(Especially useful when you want the entire FreeBSD install to be
mirrored). I guess I can try this on the extra disks as well.

Regards,
Sebastiaan


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-05 Thread Sebastiaan van Erk
196 Reallocated_Event_Count 0x0032   100   100   000Old_age   Always 
  -   0
197 Current_Pending_Sector  0x0012   100   100   000Old_age   Always 
  -   0
198 Offline_Uncorrectable   0x0030   100   100   000Old_age 
Offline  -   0
199 UDMA_CRC_Error_Count0x003e   100   100   000Old_age   Always 
  -   0
200 Multi_Zone_Error_Rate   0x000a   100   100   000Old_age   Always 
  -   0
201 Soft_Read_Error_Rate0x000a   253   253   000Old_age   Always 
  -   0


SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 0
Warning: ATA Specification requires self-test log structure revision 
number = 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data 
structure revision number = 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
100  Not_testing
200  Not_testing
300  Not_testing
400  Not_testing
500  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[EMAIL PROTECTED](ttyp3:60:0):~#


Jeremy Chadwick wrote:

On Tue, Aug 05, 2008 at 12:28:40PM +0200, Sebastiaan van Erk wrote:
However, a lot of disk activity on the drives will often cause the  
machine to crash and spontaneously reboot. I checked out which chipset  
was on the card with pciconf -lv and I found it was the Sil 3512.  
Googling showed me that I'm not the only one with problems using this 
card.


Yes, most of the Silicon Image ICs I've read about have odd driver
problems or general issues (even under Windows).  The system rebooting
is an odd one; you sure your PSU can handle two disks?

Does anybody have experience with a (preferably not too expensive)  
2-port SATA expansion card which does not have any issues running under  
FreeBSD 6.3/7.0?


Promise makes some consumer-priced cards which work very well under
FreeBSD (sos@ has full documentation on their cards).

Their RAID controllers (the consumer-level ones) **do not** require that
you use RAID; they support JBOD, and the disks will show up under
FreeBSD as ad(4) devices.  (If you choose to use the RAID, you'll still
see the ad(4) disks, but you'll also see an ar(4) device too.  This has
the added advantage of you being able to monitor SMART stats on the
disks themselves directly, etc...


[pciconf -lv output]
[EMAIL PROTECTED]:10:0:  class=0x018000 card=0x35121095 chip=0x35121095  
rev=0x01

hdr=0x00
vendor = 'Silicon Image Inc (Was: CMD Technology Inc)'
device = 'Sil 3512 SATALink/SATARaid Controller'
class  = mass storage

[/var/log/messages before the crash]
Aug  5 11:16:14 piglet kernel:  
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] error = 6

Aug  5 11:16:17 piglet last message repeated 9 times


Are you sure this is being caused by the controller?  Have you checked
SMART statistics on both disks?  Assuming error == errno, errno 6 is
"Device not configured".

There's been recent discussion of such messages being caused by the use
of gmirror or gjournal, when the mirror/journal is improperly set up.
(In one users' case, he was receiving similar errors, as well as the
filesystem failing during fsck.  Turns out he incorrectly configured
journalling, which nuked the last ~1MB of his UFS filesystem.)

I'm not saying this is the reason for the messages you see, but it's
something to keep in mind.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-05 Thread Sebastiaan van Erk

Hi,

Sorry about that, I believe I only messed up on my first reply, and I 
thought I mailed that to the list as well after I noticed I messed up.


Thing is, I'm used to replying to mailing lists using the "Reply" button 
and unfortunately the reply doesn't go to the mailing list when I do 
that... Some people don't like it when you send it to the mailing list 
and CC it to them personally, but since you apparently do, I'll just use 
reply-all from now on.


Sorry again about the mistake,

Regards,
Sebastiaan

Jeremy Chadwick wrote:

On Tue, Aug 05, 2008 at 03:16:41PM +0200, Sebastiaan van Erk wrote:

Sorry for forgetting to paste the smart details. Pressed send too quickly.


A note for the list: Sebastiaan and I are discussing the details
off-list.  I don't know if he forgot to CC the list on his replies, or
if he intentionally sent them to me directly.  :-)

Just thought I'd make note of that here, in case readers wonder what
becomes of this issue.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-05 Thread Sebastiaan van Erk

Jeremy Chadwick wrote:


First and foremost, you've forgotten to CC the mailing list on all but
one of your replies.  I'll assume this is intentional, but it's probably
not for the best, as readers may find your post and wonder what the
outcome was.


It was not intentional, it hit reply instead of reply-all. Sorry. I will 
reply this to the list, so other interested parties can follow the 
thread and your informative replies.



On Tue, Aug 05, 2008 at 02:47:45PM +0200, Sebastiaan van Erk wrote:

Hi,

Thanks for the reply.

Jeremy Chadwick wrote:

Yes, most of the Silicon Image ICs I've read about have odd driver
problems or general issues (even under Windows).  The system rebooting
is an odd one; you sure your PSU can handle two disks?
Well, I've got a 450W Asus PSU in there, but I've also got 6 hard disks  
and 1 dvd-rom drive (mostly inactive) in there. The hard disks are  
mostly 250/300GB but the two new ones are 1TB SATA drives. But the 450W  
should easily be enough, shouldn't it?


Without getting into semantics, a 450W PSU may be on the light side for
6 disks.  I'm fairly amazed you're able to power up that machine without
disk errors or other problems during POST.  You'll be having 6 disks
spin up all simultaneously -- and spin-up is when disks draw the most
power, and possibly during normal operation.

If you have a different (or larger) PSU, I would recommend trying that
to see if it addresses your problem.  A PSU which isn't providing enough
power will cause the disks to occasionally disconnect from the bus, or
the machine sporadtically lock up, reboot (power-cycle), or other odd
things.


Unfortunately I don't have a larger PSU lying around, but I could buy 
one; though I'd like to try some other stuff first because I've had 6 
disks in my PC before without any problems.



[/var/log/messages before the crash]
Aug  5 11:16:14 piglet kernel:   
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] 
error = 6

Aug  5 11:16:17 piglet last message repeated 9 times

Are you sure this is being caused by the controller?  Have you checked
SMART statistics on both disks?  Assuming error == errno, errno 6 is
"Device not configured".
I did look at the smart stats [pasted them below]. What I will try next  
is just to switch the two 250GB SATA drives on my main board with the  
two 1TB drives on the controller and see if I still get the problems if  
I really increase the load on the two 1TB drives.


More and more information about your system configuration is coming to
light.  Your original post didn't disclose any of that; now I know you
have 6 disks in the system, 2 of which are using on-board SATA (no idea
what controller), and 2 which are using a Silicon Image controller.
What are the remaining 2 disks connected to?


Sorry that I didn't give you that information immediately. The problem 
when you do that though is that the post is sometimes ignored because it 
is deemed too long or complicated (at least I've seen that happen). I'll 
glady post any relevant data.


My other (on-board) SATA controller is a VIA controller; and I've never 
had any problems with it (although the hardware raid messed up once a 
year or 2 ago, and since then I've been using software raid without any 
issues).


[EMAIL PROTECTED]:15:0:  class=0x010400 card=0x71421462 chip=0x31491106 
rev=0x80 hdr=0x00

vendor = 'VIA Technologies Inc'
device = 'VT8237  VT6410 SATA RAID Controller'
class  = mass storage
subclass   = RAID

The remaining disks are PATA disks which are in the on-board IDE 
controller. It's a legacy computer that's been upgraded a lot, though 
it's not too obsolete, the CPU's a AMD Sempron(tm) Processor 2600+ 
(1599.83-MHz 686-class CPU).



Your recommended method of troubleshooting (swapping the 250G for the
1TB) is a good idea.  But hear me loud and clear: just because you
switch the disks and the problem disappears for a few hours doesn't mean
it's gone.  There have been **many** people who have shown up on the
mailing lists stating "I did  and now it works!", only to find
that a week later it *didn't* fix the problem.


Yes, I don't really expect it to solve the problem, but was thinking 
that at least I could try and stress test the known working disks on the 
controller and try to see if it's the controller that's the problem or 
the disks (or something else). I've been able to reproduce the crashes 
pretty well by just doing a lot of disk IO on the 1TB disks only (so the 
other disks were pretty idle during the tests).



There's been recent discussion of such messages being caused by the use
of gmirror or gjournal, when the mirror/journal is improperly set up.
(In one users' case, he was receiving similar errors, as well as the
filesystem failing dur

Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-06 Thread Sebastiaan van Erk

Hi,

Yes, good thing you pointed this out, I hadn't seen those yet:

Aug  5 09:52:53 piglet ntpd[860]: kernel time sync enabled 2001
Aug  5 11:15:05 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:05 piglet kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 
retry left) LBA=218885455
Aug  5 11:15:05 piglet kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 
retry left) LBA=218885455

Aug  5 11:15:10 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: ad6: FAILURE - device detached
Aug  5 11:15:31 piglet kernel: subdisk6: detached
Aug  5 11:15:31 piglet kernel: ad6: detached
Aug  5 11:15:31 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: ad4: FAILURE - device detached
Aug  5 11:15:31 piglet kernel: subdisk4: detached
Aug  5 11:15:31 piglet kernel: ad4: detached
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 
disconnected.
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad4 
disconnected.
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider 
mirror/gm1 destroyed.

Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1 destroyed.
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, 
length=16384)]error = 6
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=112069312512, 
length=131072)]error = 6
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=112069443584, 
length=131072)]error = 6
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=112069574656, 
length=131072)]error = 6
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=112069705728, 
length=131072)]error = 6
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=112069836800, 
length=131072)]error = 6
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=112069967872, 
length=131072)]error = 6
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376121856, length=2048)]error = 6
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, 
length=16384)]error = 6

Aug  5 11:15:35 piglet last message repeated 13 times

Regards,
Sebastiaan

Andrey V. Elsukov wrote:

Sebastiaan van Erk wrote:

[/var/log/messages before the crash]
Aug  5 11:16:14 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, 
length=16384)]error = 6

Aug  5 11:16:17 piglet last message repeated 9 times


Can you show which messages where before these?



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-06 Thread Sebastiaan van Erk

Hi,

Thanks again for the detailed reply!


See the very bottom of my mail.  I don't believe the PSU is the problem,
after reviewing your SMART statistics.


Ok, I'll stick to the one I have then, for now.

My other (on-board) SATA controller is a VIA controller; and I've never  
had any problems with it (although the hardware raid messed up once a  
year or 2 ago, and since then I've been using software raid without any  
issues).


Okay, so you've got an onboard VIA (VT6410) SATA controller, an onboard
VIA IDE controller, and a PCI SATA controller.  I'd still like to know
which disks are attached to what controller, and if any of the devices
are sharing IRQs.  Can you provide the output from the following two
commands?

dmesg | egrep 'atapci|(ad|ata)[0-9]+'
vmstat -i

I'm just trying to narrow stuff down.


Allright, attached is the output to both of these commands.


It's interesting that the disks which are giving you trouble are Samsung
disks.  There's some history here which you should be made aware of:

In July, Daniel Eriksson reported data corruption occurring with his
nVidia MCP55 chipset when 1TB Samsung disks were attached to it.  The
same disks on another controller performed fine.  The corruption was
being detected by ZFS as checksum errors.  (UFS/UFS2 won't detect this
sort of thing, unless the corruption is occurring somewhere within the
filesystem tables.)

http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043427.html

Soren Schmidt (ata(4) author) replied that there are some nVidia
chipset-related fixes for ATA in -CURRENT, and provided a patch.  Daniel
reported that the patch made absolutely no difference:

http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043434.html

Daniel also tried using a firmware patch for his Samsung disks, which
limit the SATA speed to SATA150, but the speed was still negotiated as
SATA300 (indicating the vendors' own f/w patch is broken, or FreeBSD
does not play well with it).  The f/w patch didn't fix his problem
either:

http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043432.html

[EMAIL PROTECTED] reported using his MCP55 controller without any
problem -- as long as he didn't use Samsung disks.  He stated that he
believes Samsung disks are PATA disks that use a PATA-to-SATA adapter
inside of the drive, leading to problems (and yes, those adapters are
known to cause all sorts of mayhem):

http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043485.html

I'm not sure what became of the thread; Daniel never provided a
post-mortem.  I'm left to believe he probably took [EMAIL PROTECTED]'s
advice and switched to another disk vendor.


Gee, I that's a whole list. Before today I didn't know that there was 
that much difference between disk vendors (especially in terms of 
compatibility). I'll keep that in mind when I buy new disks. Thing is 
I've had a bunch of disks (Maxtor, Seagate, Western Digitals, Samsung, 
etc), but I've had bad experiences with both Seagate and Western 
Digital. (Basically, I've never had a Seagate last me more than 2 years 
(laptop drives), and I had a raid5 array of WD's of which 3 crashed 
within 2 years). Never had much trouble with Maxtor or Samsung yet, but 
obviously take this all with a grain of salt, because 10 disks don't 
make solid statistics.



Thanks for upgrading to 5.38.  All the SMART statistics for these disks
look okay.


No problem, thanks for looking into this in so much detail!


Can you run some SMART tests on the disks?  You can run these tests
while the disks are in use (but I/O will make the test take longer to
complete):

smartctl -t short /dev/ad4
smartctl -t short /dev/ad6

Then you'll need to look at the SMART self test log, as well as the
SMART error log, to see if anything is returned.  Make sure the tests
have completed (the Status field should be "Completed without error",
unless an error was found of course):

smartctl -a /dev/ad4
smartctl -a /dev/ad6


I attached the output below, the tests passed. But I thought I'd reply 
that you know I'm on it. Currently I'm running the offline tests, but 
they will take another 3 hours at least to complete. Will get you the 
output of those as soon as they're done.



If nothing is found, try a different test (also safe to run during
operation; don't let the word "offline" scare you), and repeat looking
at the logs once more.  This test may take some time, though:

smartctl -t offline /dev/ad4
smartctl -t offline /dev/ad6

At this point, I'm inclined to believe the issue is specific to those
Samsung disks.  I do not believe your PSU is a problem; the SMART
statistics would be showing a higher number of power-cycles if the disks
were losing power.

Worth noting (about Samsung disks) is that smartctl has options to work
around 3 different firmware bugs.  The bugs are SMART statistics-related,
but those kind of mistakes don't give me "warm fuzzies".  Be wary.  :-)


Nope, that definitely does not give great confidence.

I still hav

Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-06 Thread Sebastiaan van Erk

Hi,

Ok, those rl1: watchdog timeouts didn't ring a bell with me because I'd 
seen them before; however a quick grep in the logs (which date back to 
May 25) show no other watchdog timeout matches.


To try and avoid being incomplete again, I'll just attach the full dmesg 
below.


Jeremy Chadwick wrote:

On Wed, Aug 06, 2008 at 11:37:16AM +0200, Sebastiaan van Erk wrote:

Yes, good thing you pointed this out, I hadn't seen those yet:

Aug  5 11:15:05 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:05 piglet kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=218885455
Aug  5 11:15:05 piglet kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=218885455
Aug  5 11:15:10 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: ad6: FAILURE - device detached
Aug  5 11:15:31 piglet kernel: subdisk6: detached
Aug  5 11:15:31 piglet kernel: ad6: detached
Aug  5 11:15:31 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: ad4: FAILURE - device detached
Aug  5 11:15:31 piglet kernel: subdisk4: detached
Aug  5 11:15:31 piglet kernel: ad4: detached
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 
disconnected.
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad4 
disconnected.
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider mirror/gm1 
destroyed.
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1 destroyed.
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] error = 6


Kudos to Andrey for asking a simple yet incredibly benefitial question.

You have a much greater problem here, and it doesn't look specific to
your disks.  It looks as if an interrupt is stalled or locked.  I'm
willing to bet your rl1 Realtek NIC and your ATA controller (associated
with disks ad4 and ad6) use the same IRQ.  vmstat -i output should help
clear that up, or dmesg output.

I'll tell you that there have been some watchdog timeout fixes committed
to rl(4) in recent months, depending upon what specific model and
revision of Realtek NIC you have.  No offence intended, but Realtek is
definitely the worst of the bunch.  I'm willing to bet it's an on-board
NIC too.  :-)


Actually, I have 3 NICs in my PC (all of them in use). My machine is the 
server/router in my home network, so it has the onboard vr0 NIC 
connected to my ADSL modem, the rl0 nic connected to my internal wired 
lan, and the rl1 nic connected to my wireless router (my internal wired 
lan is firewalled from the wireless, since I don't really trust wireless 
security ;-)).



I'm CC'ing PYUN Yong-Hyeon here, as he presently maintains/works on the
rl(4) driver, and might be able to help determine if the Realtek NIC is
what's causing all of this, or if the ATA chipset (is this the VIA?  We
don't know yet) is causing it first.

Finally, what motherboard brand and model is this, and what BIOS
revision or version?


I attached the output of dmidecode (and dmesg), hopefully that contains 
all you need to know.


BTW: I did a reply all, but I'm not sure if that is the "right" policy 
here. If I'm bothering anybody with this and they prefer to only see the 
mail on the list, then please let me know!


Regards and thanks for all the help,
Sebastiaan
Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.3-PRERELEASE #20: Wed Jan  2 19:48:49 CET 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/PIGLET
MPTable: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Sempron(tm) Processor 2600+ (1599.83-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20fc2  Stepping = 2
  
Features=0x78bfbff
  Features2=0x1
  AMD Features=0xe2500800
  AMD Features2=0x1
real memory  = 1056964608 (1008 MB)
avail memory = 1020919808 (973 MB)
ioapic0: Assuming intbase of 0
ioapic0  irqs 0-23 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Jan  2 2008 19:48:29)
cpu0 on motherboard
pcib0:  pcibus 0 on motherboard
pci0:  on pcib0
agp0:  mem 0xe800-0xefff at device 0.0 on 
pci0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
pci1:  at device 0.0 (no driver attached)
rl0:  port 0xd000-0xd0ff mem 0xf6084000-0xf60840ff 
irq 16 at device 8.0 on pci0
miibus0:  on rl0
rlphy0:  on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: Ethernet address: 00:50:fc:57:a2:4b
rl1:  port 0xd100-0xd1ff mem 0xf608-0xf60800ff 
irq 17 at device 9.0 on pci0
miibus1:  on rl1
rlphy1:  on miibus1
rlphy1:  10baseT, 10baseT-FDX

Re: Stable SATA pci card for FreeBSD 6.x/7.0

2008-08-06 Thread Sebastiaan van Erk

Bummer, I forgot the dmidecode output.

Sorry about that. :-(

Regards,
Sebastiaan

Sebastiaan van Erk wrote:

Hi,

Ok, those rl1: watchdog timeouts didn't ring a bell with me because I'd 
seen them before; however a quick grep in the logs (which date back to 
May 25) show no other watchdog timeout matches.


To try and avoid being incomplete again, I'll just attach the full dmesg 
below.


Jeremy Chadwick wrote:

On Wed, Aug 06, 2008 at 11:37:16AM +0200, Sebastiaan van Erk wrote:

Yes, good thing you pointed this out, I hadn't seen those yet:

Aug  5 11:15:05 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:05 piglet kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 
retry left) LBA=218885455
Aug  5 11:15:05 piglet kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 
retry left) LBA=218885455

Aug  5 11:15:10 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: ad6: FAILURE - device detached
Aug  5 11:15:31 piglet kernel: subdisk6: detached
Aug  5 11:15:31 piglet kernel: ad6: detached
Aug  5 11:15:31 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: rl1: watchdog timeout
Aug  5 11:15:31 piglet kernel: ad4: FAILURE - device detached
Aug  5 11:15:31 piglet kernel: subdisk4: detached
Aug  5 11:15:31 piglet kernel: ad4: detached
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 
disconnected.
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider ad4 
disconnected.
Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1: provider 
mirror/gm1 destroyed.

Aug  5 11:15:31 piglet kernel: GEOM_MIRROR: Device gm1 destroyed.
Aug  5 11:15:31 piglet kernel: 
g_vfs_done():mirror/gm1s1e[WRITE(offset=111376236544, length=16384)] 
error = 6


Kudos to Andrey for asking a simple yet incredibly benefitial question.

You have a much greater problem here, and it doesn't look specific to
your disks.  It looks as if an interrupt is stalled or locked.  I'm
willing to bet your rl1 Realtek NIC and your ATA controller (associated
with disks ad4 and ad6) use the same IRQ.  vmstat -i output should help
clear that up, or dmesg output.

I'll tell you that there have been some watchdog timeout fixes committed
to rl(4) in recent months, depending upon what specific model and
revision of Realtek NIC you have.  No offence intended, but Realtek is
definitely the worst of the bunch.  I'm willing to bet it's an on-board
NIC too.  :-)


Actually, I have 3 NICs in my PC (all of them in use). My machine is the 
server/router in my home network, so it has the onboard vr0 NIC 
connected to my ADSL modem, the rl0 nic connected to my internal wired 
lan, and the rl1 nic connected to my wireless router (my internal wired 
lan is firewalled from the wireless, since I don't really trust wireless 
security ;-)).



I'm CC'ing PYUN Yong-Hyeon here, as he presently maintains/works on the
rl(4) driver, and might be able to help determine if the Realtek NIC is
what's causing all of this, or if the ATA chipset (is this the VIA?  We
don't know yet) is causing it first.

Finally, what motherboard brand and model is this, and what BIOS
revision or version?


I attached the output of dmidecode (and dmesg), hopefully that contains 
all you need to know.


BTW: I did a reply all, but I'm not sure if that is the "right" policy 
here. If I'm bothering anybody with this and they prefer to only see the 
mail on the list, then please let me know!


Regards and thanks for all the help,
Sebastiaan

# dmidecode 2.9
SMBIOS 2.3 present.
33 structures occupying 996 bytes.
Table at 0x000F0800.

Handle 0x, DMI type 0, 20 bytes
BIOS Information
Vendor: Phoenix Technologies, LTD
Version: 6.00 PG
Release Date: 06/27/2006
Address: 0xE
Runtime Size: 128 kB
ROM Size: 512 kB
Characteristics:
ISA is supported
PCI is supported
PNP is supported
APM is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/360 KB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 KB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
   

Sound skipping problems

2005-11-06 Thread Sebastiaan van Erk

Hi,

I have major sound skipping problems on FreeBSD 6.0. I checked the 
mailing list archives and found a related thread:


http://lists.freebsd.org/pipermail/freebsd-current/2005-June/051103.html

To quote Jeff Roberson:

> I have a patch that should greatly improve the sound skipping problems
> people have under heavy io load.  Several people sent me traces that
> showed the buf daemon running for hundreds of milliseconds with Giant
> held, which can hold up the pcm code.  The patch is available at:
>
> http://www.chesapeake.net/~jroberson/flushbuf.diff

The problems are definately correlated to io load, however I can't say 
that I have HEAVY io loads. A simple: # sync;sync;sync; will already 
cause the sound to skip.


I have DMA enabled on all drives, and it seems the above patch is 
already merged into FreeBSD 6.0-STABLE. This leaves me at a loss, and I 
don't know what else to try...


Does anybody have any ideas of what I could do to solve this problem?

Thanks in advance,
Sebastiaan van Erk

[EMAIL PROTECTED](ttyp9:92:0):~# uname -a
FreeBSD piglet.sebster.com 6.0-STABLE FreeBSD 6.0-STABLE #1: Sat Nov  5 
23:42:18 CET 2005 
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/PIGLET  i386


[EMAIL PROTECTED](ttyp9:93:0):~# cat /dev/sndstat
FreeBSD Audio Driver (newpcm)
Installed devices:
pcm0:  at io 0xec00 irq 22 kld snd_via8233 (5p/1r/0v 
channels duplex default)


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Sound skipping problems

2005-11-07 Thread Sebastiaan van Erk

Hi,

Thanks for the tip, this seems to do the trick.

Tested it with /usr/ports/sysutils/stress:

[EMAIL PROTECTED](ttyp7:42:0):/shared# stress --cpu 8 --io 4 --vm 2 --vm-bytes 
128M --hdd 4 --timeout 10m

stress: info: [2439] dispatching hogs: 8 cpu, 4 io, 2 vm, 4 hdd

and heard no more skips.

Greetings,
Sebastiaan van Erk

Markus Trippelsdorf wrote:

On Sun, Nov 06, 2005 at 12:39:02PM +0100, Sebastiaan van Erk wrote:


Hi,

I have major sound skipping problems on FreeBSD 6.0. I checked the 
mailing list archives and found a related thread:


...


Does anybody have any ideas of what I could do to solve this problem?



You could try to set hint.pcm.0.buffersize="16384" in /boot/loader.conf .
It solved the problem for me.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Sound skipping problems

2005-11-07 Thread Sebastiaan van Erk

Hi,

Thank you for your reply!

I tried the patch, but unfortunately when I reboot with the patch (which 
cleanly applies and compiles), audio stops working. The device (pcm0) is 
still there, the mixer is set ok, and everything looks normal, just no 
sound comes out of the speakers.


I have no idea why the patch doesn't work, but if you want any more 
information I'll be happy to supply it to you. The sound skipping seems 
at least fixed by just increasing the buffer size, but don't know how 
reliable this workaround is compared to a structural workaround.


Furthermore I don't know if this message is relevant, but it seems the 
snd_8233 driver doesn't like my audio codec very much:


pcm0:  port 0xec00-0xecff irq 22 at device 17.5 on pci0
pcm0: [GIANT-LOCKED]
pcm0: 

Greetings,
Sebastiaan van Erk


Ariff Abdullah wrote:

On Sun, 06 Nov 2005 12:39:02 +0100
Sebastiaan van Erk <[EMAIL PROTECTED]> wrote:


Hi,

I have major sound skipping problems on FreeBSD 6.0. I checked the 
mailing list archives and found a related thread:


http://lists.freebsd.org/pipermail/freebsd-current/2005-June/051103.html

To quote Jeff Roberson:

> I have a patch that should greatly improve the sound skipping
> problems people have under heavy io load.  Several people sent me
> traces that showed the buf daemon running for hundreds of
> milliseconds with Giant held, which can hold up the pcm code. 
> The patch is available at:

>
> http://www.chesapeake.net/~jroberson/flushbuf.diff

The problems are definately correlated to io load, however I can't
say  that I have HEAVY io loads. A simple: # sync;sync;sync; will
already  cause the sound to skip.

I have DMA enabled on all drives, and it seems the above patch is 
already merged into FreeBSD 6.0-STABLE. This leaves me at a loss,

and I  don't know what else to try...

Does anybody have any ideas of what I could do to solve this
problem?




Recompile your kernel with "options PREEMPTION", and apply this patch:

http://people.freebsd.org/~ariff/snd_RELENG_6_0_20051030_058.diff


--
Ariff Abdullah
MyBSD

http://www.MyBSD.org.my (IPv6/IPv4)
http://staff.MyBSD.org.my (IPv6/IPv4)
http://tomoyo.MyBSD.org.my (IPv6/IPv4)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: DHCP client error: domain_not_set.invalid

2005-11-14 Thread Sebastiaan van Erk
I had this as well. It means that your DHCP server returns an invalid 
search domain.


The easy way to solve it (if you have access) is to set the search 
domain to something valid in your DHCP server (Linksys router by any 
chance?). I couldn't find a flag on dhclient to tell it to ignore 
invalid search domains: this would be really handy so that you can 
connect to badly set up networks when you don't have access to the router.


Greetings,
Sebastiaan

Mark Space wrote:

Hi all,

I just set up the latest 6.0 release, and I'm getting errors with the 
DHCP client.  Trying to pull a network address during start up, I get:


Bogus domain search list 15: domain_not_set.invalid

This repeats several times before giving up.  Google tells me that this 
problem was report by two users on the bsd-current list.  No one ever 
replied to their inquiries (at least on the list), so I thought to try 
once more to see if there's any interest in addressing this issue.

More info was in the original post:
http://lists.freebsd.org/pipermail/freebsd-current/2005-October/057034.html

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Bug in netgraph?

2005-11-16 Thread Sebastiaan van Erk

Hi,

There seems to be a bug/problem with GRE (netgraph) in FreeBSD in 
dealing with fragmented packets. When I have the following nat rules:


List of active MAP/Redirect filters:
map ng0 10.0.0.0/8 -> 80.126.244.3/32 portmap tcp/udp 4:5 
mssclamp 60

map ng0 10.0.0.0/8 -> 80.126.244.3/32 mssclamp 60

everything works, but when I don't include the mssclamp option then 
connects to for example www.google.com (searching for test) from my 
internal network hang and timeout constantly.


I'm using FreeBSD 6.0 stable in combination with mpd and ipfilter 4.1.18:

IP Filter: v4.1.8 initialized.  Default = block all, Logging = enabled

[EMAIL PROTECTED](ttyp8:16:64):~> mpd --version
Version 3.18 ([EMAIL PROTECTED] 22:28  5-Nov-2005)

[EMAIL PROTECTED](ttyp8:12:0):~> uname -a
FreeBSD piglet.sebster.com 6.0-STABLE FreeBSD 6.0-STABLE #12: Wed Nov 16 
13:34:20 CET 2005 
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/PIGLET  i386


Greetings,
Sebastiaan van Erk
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: DHCP client error: domain_not_set.invalid

2005-11-22 Thread Sebastiaan van Erk

Hi,

I understand the idea that bad values should be rejected, but in 
reality, I have the same DSL modem that these others have and there is 
no way to change the domain search list that it sends. No way that I 
could find at least. This is SBC-Yahoo in California, so there are a lot 
of people out there with this modem.



Well ring your ISP and complain.  Too many people just
accept crappy service.


This is just the attitude that's going to get people to use other 
software. People are going to laugh at you trying to get a network 
connection and joke "it works fine with Windows". Then you try and 
explain that it's not your OS's fault and somebody messed up some 
setting somewhere else. And then they laugh some more watching you struggle.


Furthermore it's really not realistic to expect that ISP's are going to 
do anything about it either. They have a billion other more important 
issues other than solving that insignificant problem that "that guy who 
is using an unsupported OS" has. They really don't care.



dhcpd should either

1. accept bogus names (warnings are fine)
2. offer a configuration option or command line switch to allow the 
bogus domain if we wish
3. offer a configuration option like isc-dhcpd does so that we can 
ignore or override the setting


I would have to agree here. I think option 2 is great, because it gets 
people to be aware of the problem, but it allows them to workaround it 
if necessary.


I really think it's terrible to have the software just reject a lease 
because of an invalid search domain, without you being able to fix it 
without hacking code. That's going a bit overboard IMHO and is just 
going to cause more problems than it's going to solve.


Greetings,
Sebastiaan


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: DHCP client error: domain_not_set.invalid

2005-11-23 Thread Sebastiaan van Erk

Hi,

Mark Andrews wrote:

This is just the attitude that's going to get people to use other 
software. People are going to laugh at you trying to get a network 
connection and joke "it works fine with Windows". Then you try and 
explain that it's not your OS's fault and somebody messed up some 
setting somewhere else. And then they laugh some more watching you struggle.



Actually it is reasonable.  Windows lets users violate RFC's
in many ways.


Yes, it might be reasonable, but it is still going to stop you from 
being able to connect to a network, whereas Windows users have no problems.



RFC 952 specifies what is legal in a hostname.  While one
can theoretically search for things other than hosts the
only real use of the search strings today is for hostnames
and/or mail domains (which are syntactically indentical to
hostnames).

What would be really interesting to know is what they expect
the customers to find using this suffix.


Actually the entire search domain thing is pretty useless in most cases 
for home users (unless they have their own internal network, in which 
case they have their own DNS and DHCP servers). People navigate the 
internet using fully qualified domain names and it is almost never 
necessary to have a search domain; it just slows things down having it 
search for hostnotfound.domain.com.mysearchdomain.com.



My bet is that this really is just a configuration error on
their part.


Could be, or more probably it's just the default setting of the modem. 
I've had one of these modems, and it took me forever to find the proper 
setting because in the web interface of the modem they obviously didn't 
feel like calling it search domain; I can't remember what they called it 
but they annotated it with the comment "necessary for some ISPs", which 
just completely wrong-footed me.


I'm not fall into an endless discussion so I'm going to wrap it up, but 
I think it would be really nice if the FreeBSD user could solve this 
problem themselves instead of having to rely on other people that may 
not be inclined to put much priority on the issue. And by that I mean a 
solution other than hacking the code, which is quite much to ask of a 
regular user. An option to ignore the setting would be just fine, an 
option to override it even better. I don't know if you can even disable 
the search domain (haven't read the RFC) but this would be even better 
in many cases,  avoiding queries that are not necessary.


Greetings,
Seb*

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: DHCP client error: domain_not_set.invalid

2005-11-23 Thread Sebastiaan van Erk

Hi,

Greg Barniskis wrote:

Mark Andrews wrote:


Yes it is reasonable to expect ISP to fix things like this.
You pay the ISP to operate there part of the network within
the operational contraints of the RFCs (Standards track and
BCP).



I totally agree. Make sure when calling tech support on things like this 
that you are *not* asking them to provide FreeBSD support, that you can 
handle that angle of the connection quite well, thanks. Explain that the 
evidence shows that their system appears to violate global connectivity 
standards (if you can name which RFC and exactly how it's violated, 
great, but don't expect first tier help desk phone operators to 
understand that as it is probably way, way beyond their troubleshooting 
script).


I think this would all be reasonable in a perfect world. In the real 
world you're paying the operator to get internet access and they often 
list which operating systems they support (and they don't list FreeBSD). 
They're going to ask you what operating system are you running, then ask 
you if your connection works; and when you say it works under windows 
but violates an ``RFC'' they're just not going to give it much priority.


Then when the help desk staff goes "uhm...", politely ask to be 
escalated to second tier and clearly and politely state your case there, 
again making it clear that you are *not* asking for FreeBSD support, but 
support by them of global connectivity standards that every ISP ought to 
be respecting.


At least you have a chance of getting your trouble ticket marked 
something like "Unresolved -- Bug" instead of "Resolved -- Unsupported 
OS". That is to say, the kind of ticket that self-escalates to engineers 
and managers somewhere away from the help desk proper.


The word chance says it all. And all the time you're hoping for this 
chance to become reality you cannot use your broadband connection. 
Furthermore there are two other problems with this approach:
1) it often costs you a lot of money (even though it can be argued that 
it is reasonable that ISPs fix real problems free of charge and not 
charge you an arm and a leg for it, in the real world the situation is 
often not so perfect).
2) it often costs you a lot of time; it's going to be really hard to 
even get your request escalated to second tier, and it's definately 
going to take days and mulitple calls before they start to take you 
seriously.


In the end, it's the FreeBSD user that suffers.

Greetings,
Sebastiaan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"