On 6/1/2011 9:27 AM, Gerhard Postpischil wrote:
On 6/1/2011 9:19 AM, Charles Mills wrote:
It seems to me like the ideal way to do this would be to have not two
stages
(MVC for 256 and EX'ed MVC) but rather three cases: A loop with a
"hard-coded" or "unrolled" string of 16 MVC's that moved 4K blocks and
incremented registers by 4K on each iteration; followed by a loop of
256-byte MVCs; followed by an EX'ed MVC for 1 to 255 bytes. (Obviously
each
step would be optional depending on the exact count.)
If you go through this exercise, I'd also suggest one (minor?) variation
- variable MVCs to bump the starting address up to a 4K multiple (if
needed), then the 4K byte moves, then some more short ones, as needed.
It would be instructive to see whether that's faster than a set of
unaligned moves.
I seem to recall that there was some optimization in MVCL for 4K
aligned moves. Maybe it would be better to use MVC's to get to 4K,
and then switch to MVCL's. However all this checking may overwhelm
the savings.
Note, I have not verified my recollection of this.
--
Richard
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html