Thanks, Paul and Noel, for the detailed responses per usual!

> On Jan 20, 2019, at 6:55 AM, Noel Chiappa <j...@mercury.lcs.mit.edu> wrote:
> 
> What is [MAINDEC ZQMC] complaining about?

Looks like a few more flaky bits in a couple of additional banks.  For those 
reading along who may be unfamiliar with the MS11-L, it is laid out as 8 
physical banks, each containing 18 16K x 1 DRAMS (16 data + 2 parity bits per 
word.  So a flaky bit in a physical bank implicates one particular chip.

> Would it be possible to put [ZQMC] on a disk and boot it from there?

I have thought about that...  The most efficient way I think would be to work 
up a simple LDA loader that would fit in a boot sector, and load a diagnostic 
from contiguous disk starting at the second sector.  It would then be easy to 
blast down just the boot sector and a single desired diagnostic without imaging 
an entire pack.

> One of the first things to add [to custom diagnostic] is to store each 
> location's address in it during a set-up pass, and check to see that it's 
> still there during the checking pass.

I did this last night, actually.  I also added a "random" bits test that uses 
the program image itself as a source sequence for words to write/compare.

The good news is that my enhanced diagnostics now detect failures in the same 
physical banks and with the same bits as those flagged by the MAINDEC 
diagnostic.  This was a good lesson learned: all ones / all zeros is definitely 
not good enough when checking this sort of thing!

Another thing I found interesting, though, is that the "random" test *also* 
found a malfunctioning bit that the address test had missed.  So ones/zeros and 
address isn't really good enough, either.

I'm technically curious, now, about the failure modes of these sorts of DRAMS.  
I guess in addition to stuck bits, there are also potential decode fails (show 
up on address test, but not ones/zeros) and some errors that have 
history-dependence, perhaps internal latches (show up on random data test, but 
not address or ones/zeros.)  I'd guess also there might be potential for 
crosstalk, noise, and "fading bit" type issues as well?  Will have to see after 
I make the next round of repairs if there are still additional problems that 
the MAINDEC flags that my simplistic diag isn't shaking out.

I've also been somewhat surprised by the level of repair needed on this memory 
board.  So far, I've seen 6 failed 4116 out of an array of 144 total, so about 
a 4% failure rate.  Is this typical for vintage 4116, or did somebody leave my 
poor MS11 out in a lightning storm? :-)

> Starting the CPU (i.e. 'START' switch) or an INIT instruction will clear
> the 'trap enable' bit in the MS11-L CSR.

D'oh!  Yes, thanks; I may very well have mucked that up.  I'll give it another 
try with a little more care later today.

> Which memory has this [parity halt vs trap] feature?

Hmm, I saw this at least once when researching the variety of CSR formats 
yesterday morning; I'll have to see if I can dig it up again today.  Might be 
just a fastbus thing?  It's also hinted in paragraph 7.7.7 of the older KB11-A 
maintenance manual (NOT the later edition that covers both KB11-A and KB11-D):

"The semiconductor memory control EHA and EHB (enable halt) flip-flops may be 
set under program control to assert SMCB PE HALT L if a parity error is 
detected.  This input also asserts UBCB PARITY ERR SET L, which set the console 
flag and halts the CPU."

This particular text is removed from the later KB11-A,D maintenance manual, and 
the description there seems to imply all reported parity conditions trap 
directly to 114.  But there aren't any details in this section concerning 
processor revision/version etc. 

The logic design around all this is a bit complicated, and the fact that there 
are apparent discrepancies between the texts, available prints, and the actual 
M8106 boards I have on hand is not heartening!

> The M8106 board layout drawing (a couple of pages back from UBCB) does show 
> W1 -
> upper left corner of the board, next to E84.

Yup.  And, surprisingly, neither one of my M8106 has either a jumper or the 
indicated pull-up at that location!  I'll try to send a pic later.  The fact 
that W1 exists on the M8119 is interesting; maybe the situation is that the 
prints are for later revisions, and my actual M8106 are earlier?  My /45 is a 
very early one -- serial 154!

    cheers,
      --FritzM.

Reply via email to