Re: watching JIT generated machine instructions on z/OS

René Jansen Wed, 22 Mar 2023 11:45:52 -0700

When we compare the DFP instructions to the 'old' floating point instructions, 
we can see that these were storage-to-storage type instructions, while the new 
set is register-to-register, which can make an enormous difference with fast 
instruction caches and -pipelines -- this is why my C code is faster than my 
assembler code, because the compiler chooses the latter type over the 
storage-to-storage ones. The DAT box might have done its work, but the data is 
still in memory and not in the cache. On a heavily loaded transaction system, 
you are of course doing millions of these a second, on a lot of CP's and 
ZIIP's, specially in financial environments - this is where BigDecimal is more 
or less mandatory as a currency data type.


So I am setting up a few benchmarks, and I will report back. But the first 
question is, how does it work? Java does not have a normal compiler backend, 
where we can tell it to compile for a certain ISA (or default to the one it is 
on - but non-Java workloads are seldom compiled on the same LPAR) . We cannot 
have a driver that decides which fork of the code to take, depending on the 
hardware. The instructions are actually coughed up by the JIT whenever it is 
decided that it would benefit, which means they would be faster than 
interpreting. It could be, of course, that the runtime of the Java libraries, 
is already precompiled in the right instruction. That is probably where I am 
going to look first. But if most of it is called from libraries, it cannot 
inline the instructions, which would get rid of a lot of the performance gains. 
I hope this is documented somewhere apart from the source code, which might not 
even be open.

best regards,

René.

> On 22 Mar 2023, at 18:18, Colin Paice <[email protected]> wrote:
> 
> The difference may be down at the noise level - unless you are doing
> millions of these a second.
> For example if you had a *load register,address* in a tight loop- the
> second time may be much (100?) faster because the conversion of virtual
> address to real page address will already be done, and the data will be in
> the processor cache and so does not need to be read from RAM, or a
> different book etc.
> I found using a stack rather than a malloc in each function gave more
> benefit than trying to polish the instructions.
> Colin
> 
> On Wed, 22 Mar 2023 at 17:03, René Jansen <[email protected]>
> wrote:
> 
>> That's another interesting take; but I have to be sure that it is used
>> before I declare the winner.
>> 
>>> On 22 Mar 2023, at 18:00, David Crayford <[email protected]> wrote:
>>> 
>>> I can't answer you original question but I doubt if DFP is really that
>> much faster. I would imagine it's implemented in millicode and not silicone
>> so is a software implementation at heart. I would be surprised if it beats
>> BigDecimal on a PC but I could be wrong.
>>> 
>>> On 23/3/23 00:42, René Jansen wrote:
>>>> Without reading any documentation (sorry!), the issue at hand is this.
>> I want to show the performance gains of using DFP (Decimal Floating Point)
>> for the typical financial application, after I noticed at some other client
>> their bought packages seldom were compiled using the right compiler options
>> (some could have run in 1966 or so - well I exaggerate, but a Z9 would not
>> have been a problem).
>>>> 
>>>> This is for Java applications, and it proves slightly harder than I
>> thought. I don’t have object modules to disassemble and while I can do
>> -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly on my mac or any Linux
>> box, IBM’s J9 seems to ignore these altogether. For Linux and the mac there
>> is a .so (or .dylib) that even disassembles what you’d otherwise are shown
>> in hex - https://chriswhocodes.com/hsdis/ which is called hsdis but what
>> I would not upload to other people’s machines lightly without building it
>> myself.
>>>> 
>>>> Does anybody know how to ask the J9 (Java 8) on z/OS how to show me
>> what it does when the JIT decides native code would be best?
>>>> 
>>>> many thanks in advance,
>>>> 
>>>> best regards,
>>>> 
>>>> René Jansen.
>>>> 
>>>> 
>>>> 
>>>> ----------------------------------------------------------------------
>>>> For IBM-MAIN subscribe / signoff / archive access instructions,
>>>> send email to [email protected] with the message: INFO IBM-MAIN
>>> 
>>> ----------------------------------------------------------------------
>>> For IBM-MAIN subscribe / signoff / archive access instructions,
>>> send email to [email protected] with the message: INFO IBM-MAIN
>> 
>> ----------------------------------------------------------------------
>> For IBM-MAIN subscribe / signoff / archive access instructions,
>> send email to [email protected] with the message: INFO IBM-MAIN
>> 
> 
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: watching JIT generated machine instructions on z/OS

Reply via email to