Re: Microprocessor Optimization Primer

Gabe Goldberg Sun, 03 Apr 2016 18:30:07 -0700

I'm behind reading this epic thread, but I sent a question related to itto Bill Collier, instructor I had at IBM in 1968 while in full-timetraining for OS/360 System Design Department. He runshttp://www.mpdiag.com and is biographied athttp://www.bestweb.net/~collier -- so is valuable resource on all-thingsmultiprocessor:

Bill -- I shared this interesting document with some friends...


Communities category:Linux on System z Open Source
Ecosystem:Microprocessor ...

https://www.ibm.com/developerworks/community/forums/html/topic?id=5cf34211-c8e6-4747-a8c2-f8ff7379150b


...and for some reason we've gotten bogged down on this bit of
minutia, with one person asserting:

Interesting too was mention of "TEST AND SET", which in the 1960's,
was NOT an atomic instruction, and it caused multiprocessing
serialization problems in MVT/MP65 and from thence after, to be
replaced by "Compare and Swap", which WAS atomic.

But the PDF implied to me that now TS "Test and Set" WAS atomic.
Implied, anyhow.

A rewrite of history? A fix? Did I read it wrong?

---

My recollection is that TS was indeed atomic but wasn't adequate for
evolving requirements so was replaced by CS/CDS. There's plenty more
(too much!) in our discussion thread on TS but I think that snippet
suffices. I'll be grateful for any comments you have.

His interesting reply -- posted with permission, attribution requested;he's at [email protected] --:


  No chance of getting a good night's sleep now.  Not with a
question to answer which brings back so many memories.

  The short answer.  Yes, Test and Set was atomic.  The trouble
was that as a tie breaker, it was not effective.  You could get
into Alphonse-Gaston situations where neither processor advanced.

  In order to provide a definitive answer, I tried to look up TS.
I have a 1964 copy of the System/360 Principles of Operation
manual.  The TS instruction is not mentioned.  Apparently it came
later.

  CS fixed the AG problem.  A processor might be repeatedly edged
out in the competition and so delayed indefinitely, but at least
other processors were competing successfully and therefore
advancing.

  Were there hardware problems I am not aware of?  Possibly.  If
you want to pursue the question, I could give you the names of
some other folks who were around at the dawn of time, and you
could ask them.

  Here is the long, interesting answer to your question which
dredges up so much fun stuff from the past.

  Was TS necessary?  Is CS necessary?  You can argue that they
weren't.  The most fascinating program I ever saw was Dijkstra's
1965 paper titled "Solution of a problem in concurrent programming
control."  Using just load, store, test, and branch instructions
(that is, no atomic instructions), the program performed a LOCK
function (that is, it allowed at most one program at a time into
a critical section of code) without the possibility of deadlock
(although livelock was still possible).

  In those days it was thought that multiprocessor machines should
be (what Leslie Lamport called) "sequentially consistent".  On an
SC machine the only answers a program could compute would be the
answers it could also compute on a uniprocessor machine.

  If you think about it for a decade or so, you realize that for a
machine to be SC, it must obey at least the following three rules.

  Rule 1.  Instructions are performed in the order defined by the
program.  No out of order execution can be visible.

  Rule 2.  No scoreboarding.  If processor 1 writes into operand A
at the same time that processor 2 reads from A, the value seen by
processor 2 must be either the old value of A or the new value of
A.  It is forbidden for the value to be some combination of bits
or bytes from the old value and the new value.  Similarly if both
processors write into A at the same time.

  This rule has been called "write atomicity".

  Rule 3.  If there are several copies of operand A scattered
throughout the caches of a system, then when processor 1 changes
the value of A, it must appear as if all copies of A change value
at the same instant.

  This rule has also been called "write atomicity".

  To distinguish the two, let's refer to them as "single copy
write atomicity" (SCWA) and "multiple copy write atomicity"
(MCWA).

  Back in those days we thought that it was obvious that
multiprocessors should be SC.  And that therefore they should
successfully execute programs such as Dijkstra's.

  Then reality intruded.  Engineers found that obeying the rules
greatly reduced performance and so came up with the following
argument:  programmers should not create programs which read and
write the same operand at the same time.  If two processes need to
access a shared operand, they should perform a lock operation
which will allow only one of them to proceed and to access the
operand while forcing the other process to wait for the first
process to finish.

  Today machines routinely violate the Rules.  Therefore, today's
machines are not SC and so would not correctly execute Dijkstra's
program.

  In more detail:

  a.  I do not know of any machine which scoreboards, that is,
which violates Rule 2.

  b.  I think every machine today violates Rule 1, that is, they
perform read operations before logically preceding write
operations.

  c.  In the early days Sun and IBM machines obeyed Rule 3, and
then, for the sake of compatibility with the past, were forced to
continue to obey Rule 3.  Other machines clearly violate Rule 3.

  So now to comments on the z13 primer.

  The central design goal of System/360 was to build machines in a
range of sizes where each machine obeyed exactly the same
instruction set.  Programmers were expected to know nothing about
the behavior of the machine underneath the instruction set.  How
could they?  It could be completely different depending on which
machine in the range they were running on.

  Contrast this with the z13 machine where the user is invited down
into the very depths of the machine.

  I wonder:  have users changed?  Back in my day users were naive,
and it was our job to make things simple for them.  Are today's
users all super sophisticated and writing huge applications where
they are trying to get every last lick of performance improvement
out of the machine?

  On page 40 the document says:

  Operand fetches and stores must appear to occur in proper order.

  So is it saying that the machine obeys Rule 1?  I will bet
dollars to donuts that that is not the case, that the machine
performs read operations before logically preceding write
operations, and that a program can be run to demonstrate that
that is the case.

  Nothing is said about obeying Rule 3.  The document does talk
about coherency, but the standard definition of cache coherence
is a weaker rule than MCWA.

  There are programs to test whether or not a machine obeys Rules
1 and 3.  I an not talking about looking to detect some arcane
combination of once in a lifetime occurrences.  I am talking about
programs consisting of nothing but loads and stores of full words,
operating on shared operands.  It would be so interesting to see
how the z13 behaves under such tests.

  I have two overall reactions to the document.  First, I am
flabbergasted at the amount of function which has been moved into
the hardware.  Transactional Memory alone deserves a Nobel Prize.

  Second, there is a disappointment.  We worked so hard teasing
out the mysteries of multiprocessing.  It seems that that
achievement is not appreciated in this sense:  after reading the
z13 document, I still do not know if the machine allows
instructions to execute out of order or if the machine requires
write operations to be MCWA.

--
Gabriel Goldberg, Computers and Publishing, Inc. [email protected]
3401 Silver Maple Place, Falls Church, VA 22042           (703) 204-0433
LinkedIn: http://www.linkedin.com/in/gabegold Twitter: GabeG0

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Microprocessor Optimization Primer

Reply via email to