I'm leaving for a vacation in the morning so this is in lieu of
anything more systematic like filing bug reports.

I have just spent a couple of days trying to get a AMD Athlon5600X2
on an Elitegroup AMD690GM-M2 working. Getting the MP kernel to boot
took a lot of debugging.

a)
The arch/i386/machdep.c cpuid decode in identifycpu() tries to be all things
to all cpus. It doesn't follow the spec in AMD docs 25481 and 33610
particularly decoding the extended family and extended model fields
which become significant when the family field == 0xF.

First, does this matter? Probably not right now, as long as
the amd_family6_setup is adequate for all newer CPUs.  The algorithms
for assigning ICUs to CPUs might suffer, but I really don't know.

The documents noted above list the cpuid signatures of all recent chips,
so a simple enumeration would probably suffice with defaults for the
major families.

b) x86 BIOSes are quirky at best. The BIOS on the board mentioned
 1) puts the RDSP pointer in type 2 BIOS space (reserved) not in type
    3 BIOS space (ACPI) so the search algorithm in acpi_machdep.c
    doesn't find it causing a crash later.  Luckily acpidump(8) does
    a brute force search and does find it. It appears that the behavior
    of the BIOS is legal.
 2) the RDSP block may advertise itself as revision 2 but have a NULL
    xdsp physical pointer (64 bit) while having a correct 32 bit pointer.
    The code assumes that a revision 2 RDSP must have a valid 64 bit pointer
    which causes a crash.

c) acpi.c assumes that there must be an RDSP block so it uses whatever
   NULL happens to result from the acpi_machdep scan.

d) i386/bios.c smbios_find_table's cookie code for finding cached entries
   can't work right - 0xfff mask instead of 0xff.  The encoding (+1 or +2
   base?) is unclear. It's likely that the cacheing feature is never used.

e) There are a number of almost-duplicate "temporarily map a block of physical
space into kernel space" routines scattered about multiple architectures
and multiple places in some architectures which probably should be
(garbage) collected.

If these are known issues there's no point in me going any further.
If any of these are new issues I'll file a bug & fix after I get back.
I'll be glad to supply diffs now. They are very full of
#ifdef BIOS_DEBUG printfs.  Amazing what you'll find out when you
ask the machine to tell you what's happening.

------ don't read any further if you are easily annoyed ------
I realize that a lot of the code I was fighting with is inherited
from other projects and was not written by OpenBSD people, so
this is not a criticism of the people or the project.

This experience has validated once more some rules developed
over my 40+ years of programming:

All "should never happen" failures must have printfs at minimum to
clue the helpless about where the "can't happen" happened.
No unusual or unexpected failure may be silent.

A lack of checks for null pointers before use has a high debugging cost.
As the certified geniuses who invented the ARPAnet said:
   be extremely precise sending out
   be extremely accepting receiving BUT validate, validate, validate

Combining cacheing, 'find first', and 'find next' code in the same routine
is in my very rigid and uncompromising mind a fatal design error of
the genus "conflation of similar but incompatible goals".

geoff steckel
Omnivore Technology

Reply via email to