Thank you for the report.

According to Agner Fog's table, complex LEA instructions should have a 3-cycle latency on that architecture (Haswell). Optimisations with this instruction are proving interesting because there's such a variety between processor architectures. There are some that are fine with 3 components, but slows right down if a scale factor is used.

Kit

On 09/10/2023 14:06, Nataraj S Narayan via fpc-devel wrote:
Hi Gareth

model name : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz

Regards

Nataraj S Narayan
Synergy Info Systems
Software & Technology Consultants
Ettumanoor, INDIA
Ph:+91 9443211326


On Sun, Oct 8, 2023 at 6:40 PM J. Gareth Moreton via fpc-devel <fpc-devel@lists.freepascal.org> wrote:

    Hi Nataraj

    Which processor is that run on? (although too close to call, it
    implies LEA has a latency of 2 in that case)

    Kit

    On 08/10/2023 14:06, Nataraj S Narayan via fpc-devel wrote:
    Hi

    [nataraj@dflyHP ~]$ fpc ttt.pas
    Free Pascal Compiler version 3.2.2 [2023/07/04] for x86_64
    Copyright (c) 1993-2021 by Florian Klaempfl and others
    Target OS: DragonFly for x86-64
    Compiling ttt.pas
    Linking ttt
    /usr/local/bin/ld.bfd: warning:
    /usr/local/lib/fpc/3.2.2/units/x86_64-dragonfly/rtl/prt0.o:
    missing .note.GNU-stack section implies executable stack
    /usr/local/bin/ld.bfd: NOTE: This behaviour is deprecated and
    will be removed in a future version of the linker
    121 lines compiled, 14.9 sec
    [nataraj@dflyHP ~]$ ./ttt
       Pascal control case: 6.7 ns/call
     Using LEA instruction: 4.2 ns/call
    Using ADD instructions: 4.0 ns/call


    Nataraj S Narayan
    Synergy Info Systems
    Software & Technology Consultants
    Ettumanoor, INDIA
    Ph:+91 9443211326


    On Sat, Oct 7, 2023 at 9:39 PM J. Gareth Moreton via fpc-devel
    <fpc-devel@lists.freepascal.org> wrote:

        That's interesting; I am interested to see the assembly
        output for the
        Pascal control cases.  As for the 64-bit version, that was my
        fault
        since the assembly language is for Microsoft's ABI rather
        than the
        System V ABI, so it was checking a register with an undefined
        value.
        Find attached the fixed test.

        Kit

        P.S. Results on my Intel(R) Core(TM) i7-10750H

            Pascal control case: 2.0 ns/call
          Using LEA instruction: 1.7 ns/call
        Using ADD instructions: 1.3 ns/call

        On 07/10/2023 16:51, Tomas Hajny via fpc-devel wrote:
        > On 2023-10-07 03:57, J. Gareth Moreton via fpc-devel wrote:
        >
        >
        > Hi Kit,
        >
        >> Do you think this should suffice? Originally it ran for
        1,000,000
        >> repetitions but I fear that will take way too long on a
        486, so I
        >> reduced it to 10,000.
        >
        > OK, I tried it now. First of all, after turning on the old
        machine, I
        > realized that it wasn't Intel but AMD 486 DX4 - sorry for
        my bad
        > memory. :-( I compiled and ran the test under OS/2 there (I
        was too
        > lazy to boot it to DOS ;-) ), but I assume that it
        shouldn't make any
        > substantial difference. The ADD and LEA results were
        basically the
        > same there, both around 100 ns / call. The Pascal result
        was around
        > twice as long. Interestingly, the Pascal result for FPC
        3.2.2 was
        > around 10% longer than the same source compiled with FPC
        2.0.3 (the
        > assembler versions were obviously the same for both FPC
        versions; I
        > tried compiling it also with FPC 1.0.10 and the assembler
        versions
        > were more than three times slower due to missing support
        for the
        > nostackframe directive).
        >
        > I tested it under the AMD Athlon 1 GHz machine as well and
        again, the
        > results for LEA and ADD are basically equal (both 3.1
        ns/call) and the
        > result for Pascal slightly more than twice (7.3 ns/call).
        However,
        > rather surprisingly for me, the overall test run was _much_
        longer
        > there?! Finally, I tried compiling the test on a 64-bit
        machine (AMD
        > A9-9425) with Linux (compiled for 64-bits with FPC 3.2.3
        compiled from
        > a fresh 3.2 branch). The Pascal version shows about 4
        ns/call, but the
        > assembler version runs forever - well, certainly much
        longer than my
        > patience lasts. I haven't tried to analyze the reasons, but
        that's
        > what I get.
        >
        > Tomas
        >
        >
        >
        >>
        >> On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
        >>> On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via
        fpc-devel"
        >>> <fpc-devel@lists.freepascal.org> wrote:
        >>>
        >>>
        >>> Hii Kit,
        >>>
        >>>> This is mainly to Florian, but also to anyone else who
        can answer
        >>>> the question - at which point did a complex LEA
        instruction (using
        >>>> all three input operands and some other specific
        circumstances) get
        >>>> slow? Preliminary research suggests the 486 was when it
        gained
        >>>> extra latency, and then Sandy Bridge when it got
        particularly bad.
        >>>> Icy Lake seems to be the architecture where faster LEA
        instructions
        >>>> are reintroduced, but I'm not sure about AMD processors.
        >>> I cannot answer your question, but if you prepare a test
        program, I
        >>> can run it on an Intel 486 DX2 100 Mhz and AMD Athlon 1
        GHz machines
        >>> if it helps you in any way (at least I hope the 486 DX2
        machine
        >>> should be still able to start ;-) ).
        >>>
        >>> Tomas
        >>>
        >>> _______________________________________________
        >>> fpc-devel maillist  - fpc-devel@lists.freepascal.org
        >>>
        https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
        >>>
        >> _______________________________________________
        >> fpc-devel maillist  - fpc-devel@lists.freepascal.org
        >>
        https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
        > _______________________________________________
        > fpc-devel maillist  - fpc-devel@lists.freepascal.org
        > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
        >_______________________________________________
        fpc-devel maillist  - fpc-devel@lists.freepascal.org
        https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


    _______________________________________________
    fpc-devel maillist  -fpc-devel@lists.freepascal.org
    https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
    _______________________________________________
    fpc-devel maillist  - fpc-devel@lists.freepascal.org
    https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


_______________________________________________
fpc-devel maillist  -fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to