It's what Dr. Chung-Lung Shum told me. (He was the architect of the Z chips 
through 2022.) 

The context I asked him about was strings from ~100 to 32K bytes, and where it 
was not a waste of time to have the target data in cache after the move.

I think I benchmarked, but don't recall for sure. I know I used MVC's (in a 
situation where speed was of the essence -- thousands of executions per second).

I just found the note. He wrote:

Since you indicated that for the "to queue" - the data would have
been recently touched; and no control of alignment, it is best you
do loops of MVCs.  With the out-of-order pipeline and the internal
Hardware prefetcher, the hardware should be able to parallelize any
cache miss if it is there.  But since the data is recently touched,
it is most likely in some cache some where (not near memory), and
the MVCL near-memory mover won't help in that case.

The suggestion is also based on the fact that the size of the
records moved tend to be small (and <32K as you mentioned). The
near-memory mover will likely not provide any benefit until 40-50K
pages.

Charles

On Thu, 31 Oct 2024 17:18:21 -0400, Steve Thompson <ste...@wkyr.net> wrote:

>Uh, isn't that true up to about 1024 bytes (MVCs stacked)? And
>then after the MVCL seems to be faster.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to