Has anyone else seen this? I am experiencing random SATA errors when I turn on SMP on a dual core machine.
After a several-year hiatus, I just got some new hardware to build a plan 9 network at home. My file server is a 1U rackmount machine with the following hardware: 1. SuperMicro PDSML-LN2+ motherboard - builtin ICH7R SATA controller - builtin Intel 82573L Gigabit Ethernet adapter) 2. 1.8GHz Dual-core Intel Core2 Duo processor 3. 2GB RAM 4. 2 x 750GB SATA drives 5. 1 x 2GB Compact Flash removal disk. Note that this machine has neither a CD nor DVD drive. This is because I misread the vendor's quote: they could not fit a slimline CD or DVD drive into the 1U chassis along with two hard drives but I didn't realize that until I pulled the machine out of the box. I got around this by installing Plan 9 onto the compact flash card on another machine that did have a CD drive, then bringing it up on this machine. The first problem I had was using the SATA drives; the SATA drivers in the distributed kernel had problems, so I updated them to the latest from Erik's directory on sources. Specifically: % 9fs sources % cd /n/sources/contrib/quanstro/root/sys/src/9/pc % cp sdata.c sdiahci.c ahci.h /sys/src/9/pc % cd ../port % cp devsd.c sd.h sdloop.c /sys/src/9/port % cd ../../libfis % mkdir /sys/src/libfis % cp fis.h mkfile /sys/src/libfis % cd /sys/src/libfs % mk install I then edited the appropriate mkfile to refer to /386/lib/libfis.a and built the 'pcf' kernel, copied it to 9fat (on the CF card) and rebooted. I'm not sure that I didn't miss any steps, but I was able to fdisk, prep and flfmt the SATA drives and load the operating system by running the (slightly edited) installation scripts from /sys/lib/dist/pc/inst, choosing a fossil+venti configuration. To this point, I'd only been using one core as '*nomp=1' was set in plan9.ini. At this point, everything is still running as a terminal. Now the problem that I am seeing is that, if I boot the machine up with both cores enabled, I get some relatively small amount of use out of the SATA drives, then I get a (seemingly) random i/o error and then all further access to the drives fails. I am still booting from the CF disk, but using the fossil on the SATA drives as the root. I was also having problems with rio, but upon further investigation, I see that there are known issues with VESA and MP, but even if I don't load the VGA registers and stay in CGA mode things still behave strangely (for instance, my venti got corrupted and all of /sys/include disappeared). However, if I set '*nomp=1' in plan9.ini, everything works fine. Has anyone seen this before? Is this a known issue? Even better, is there a fix? Btw: my long term intention is to use the fs driver to mirror fossil and venti across both of the SATA drives, keep a small fossil on the CF card for emergencies, and keep a partition there for secstore data. But I haven't gotten to that stage yet. - Dan C.