> > Hi Segher, > >> Not really. Do you know how many 16/32-bit words you can add before a >> 64-bit register can overflow? :-) > > Thats a very good point. I thought about using 32bit adds when writing > the copy and checksum routine, but came to the conclusion that it wouldn't > go > any faster than one using addes.
Well, you now have one 64-bit word in two cycles, using one load and an adde. You can do 64-bits with two loads and two integer insns instead, or one load and three integer insns. It depends on your pipeline structure what is best, I don't remember what POWER6/7 have exactly, but I bet you do :-) If you don't have to deal with the carry, you don't have to care about the latency of your insns either, since you can just software pipeline it. > The checksum only routine was the same > loop > without the stores. The stores are just to copy, right? So two loads/two stores/two integer (per 64-bit), which probably works out to two cycles; or one load/ one store/ three integer, which is one or one and a half cycle. Segher _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev