>
> Hi Segher,
>
>> Not really.  Do you know how many 16/32-bit words you can add before a
>> 64-bit register can overflow? :-)
>
> Thats a very good point. I thought about using 32bit adds when writing
> the copy and checksum routine, but came to the conclusion that it wouldn't
> go
> any faster than one using addes.

Well, you now have one 64-bit word in two cycles, using one load and
an adde.

You can do 64-bits with two loads and two integer insns instead, or
one load and three integer insns.  It depends on your pipeline structure
what is best, I don't remember what POWER6/7 have exactly, but I bet
you do :-)

If you don't have to deal with the carry, you don't have to care about
the latency of your insns either, since you can just software pipeline it.

> The checksum only routine was the same
> loop
> without the stores.

The stores are just to copy, right?  So two loads/two stores/two integer
(per 64-bit), which probably works out to two cycles; or one load/
one store/ three integer, which is one or one and a half cycle.


Segher

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to