(Accidentally sent to FLorian privately instead of to the mailing list)

> > Obviously I'll submit the changes as patches so
> they can be properly reviewed and tested, but does 
> > this sound like a good idea?
> 
> 
> 
> Yes, you might have seen, that I started already with this some time ago.

Well, that's good - means I can get straight to work and save you some time, 
hopefully!  AARCH64 
doesn't have that much that needs changing in that regard, although ARM needs a 
lot of clean-up that 
I'm getting underway with.  There's potential to merge some ARM/AARCH64 
optimisations too, which 
differ only in the register names (e.g. x29 instead of r13).

I've done a little bit of refactoring of the individual optimisations to allow 
for small speed-ups.  
One of the main ones is an optimisation that converts 4 instructions onto one - 
in the trunk, it 
calls GetNextInstruction three times, along with SkipEntryExitMarker a couple 
of times, then checks 
to see if the individual operators and their operands permit the optimisation.  
I've changed this 
around a little bit so that the first instruction is evaluated, and only then 
is GetNextInstruction 
called so the next instruction can be checked, given that GetNextInstruction is 
a relatively 
expensive call and it's more likely that the criteria for the optimisation 
won't be met (e.g. it 
comes across a different instruction), so the sooner you can detect this and 
drop out, the faster 
the Peephole Optimizer will run.


> 
> > P.S. While I haven't been asked to improve
> aarch64-linux specifically, if I'm understanding things 
> > correctly, there should be very few differences
> with the actual target platform in regards to 
> > calling conventions, for example.
> 
> 
> 
> You mean with regard to different aarch64 platforms?
> 


Yes, apologies.  I mean in regards to different aarch64 platforms.  I've got 
the basics of the 
calling convention down, like with passing the first integral parameter through 
r/x0 and 
incrementing,

In a funny way, my x86-64 machine breaking is proving to be a blessing in 
disguise, since I'm now 
exploring a completely different architecture and learning its assembly 
language.  Some potential 
speed-ups, like utilising NEON and other SIMD instruction sets, fall under the 
more general 
philosophy of vectorization and would be much easier to perform once all the 
nuances with intrinsics 
are resolved, because vectorization is much easier to do with nodes than with, 
say, trying to take a 
bunch of assembler instructions and attempting to vectorize those.

Gareth aka. Kit
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to