Re: [9fans] NUMA

Bakul Shah Sun, 17 Jul 2011 10:18:07 -0700

On Jul 17, 2011, at 8:24 AM, erik quanstrom <quans...@quanstro.net> wrote:

> On Sun Jul 17 04:45:18 EDT 2011, ba...@bitblocks.com wrote:
> 
>> Also note that the ISA implementations these days are quite
>> complex (perhaps even more than your typical program).  We
>> don't see this complexty because it is all hidden behind a
>> relatively simple ISA.  But remember the FOOF bug? Usually the
>> vendor has a long errata list (typically only available on a
>> need to know basis and only under NDA!). And usually they
>> don't formally prove the implementation right; they just run
>> zillions of test vectors! I bet you would be scandalized if
>> you knew what they do :-)
> 
> i have the errata.  i've read them.  and i find them reassuring.
> you might find that surprising, but the longer and more detailed
> the errata, the longer and more intricate the testing was.  also
> long errata sheets, especially of really arcane bugs indicate the
> vendor isn't sweeping embarassing ones under the rug.  i've
> seen parts with 2-3 errata that were just buggy.  they hadn't even
> tested some large bits of functionality once!  on the other hand
> some processors i work with have very long errata, but none of
> them matter.  intel kindly makes the errata available to the public
> for their gbe controllers.  e.g.
> 
> http://download.intel.com/design/network/specupdt/322444.pdf
> page 15, errata#10 is typical.  the spec was violated, but it is
> difficult to imagine working hardware for which this would matter.
> 
> i can't speak for vendors on why errata is sometimes nda,
> but i would imagine that the main fear is that the errata can
> reveal too much about the implementation.  on the other hand,
> many vendors have open errata.  i've yet to see need-to-know
> errata.

I am sure (or sure hope) things have changed but in at two cases in the past 
the vendor reps told me that yes the bug was known *after* I told them I has 
logic analyzer traces that showed the bug. One a very well known CPU vendor, 
the a scsi chip manufacturer.

I suspect incidents like the FOOF bug changed attitudes quite a bit, at least 
for vendors like intel.

> by the way, proving an implementation correct seems simply
> impossible.  many errata (perhaps like the one i mentioned)
> come down to variations in the process that might not have
> met the models.  and how would you prove that one of the
> many physical steps in producing a chip correct anyway?

You can perhaps prove logical properties for simpler subsystems (ALU for 
instance). Or generate logic from a description in HLL such as Scheme, which 
might be easier to prove, but of course then you have to worry about the 
translator! But not the physical processes.

I do think more formal proof method might get used as more and more parallelism 
gets exploited. The combinatorial explosion of testing might lead us there!

Anyway, my point was just that there are no certainties; just degrees of 
uncertainties! You should *almost always* opt for speed (and simplicity) by 
figuring out how much uncertainty will be tolerated by your customers:-) A 
99.9% solution available today has more value than a 100% solution that is 10 
times slower and a year late. 99.9% of the time! But I guess that is my 
engineer's bias!

> 
> - erik
>

Re: [9fans] NUMA

Reply via email to