> Am 28.05.09 18:13 schrieb(en) Joakim Tjernlund: > > hmm, these do look a bit unoptimal anyway. Any reason not to write > > them something like below(written by me for uClibc long time ago). > > You will have to add eieio()/sync > > No (and I wasn't aware of the PPC pre-inc vs. post-inc stuff) - I just
I think this is true for most RISC based CPU's. It is a pity as post ops are a lot more common. The do {} while(--chunks) is also better. Basically the "while(--chunks)" is free(but only if you don't use chunks inside the loop). > stumbled over this while fixing mtd accesses to the MPC5200's Local Bus > in 16-bit mode which doesn't allow byte accesses. And I didn't want to > go too deep into this as the real fix for me is actually somewhat > different... OK. > > > /* PPC can do pre increment and load/store, but not post increment > > and load/store. > > Therefore use *++ptr instead of *ptr++. */ > [snip] > > copy_chunks: > > do { > > /* make gcc to load all data, then store it */ > > tmp1 = *(unsigned long *)(tmp_from+4); > > tmp_from += 8; > > tmp2 = *(unsigned long *)tmp_from; > > *(unsigned long *)(tmp_to+4) = tmp1; > > tmp_to += 8; > > *(unsigned long *)tmp_to = tmp2; > > } while (--chunks); > > Is this the same for all PPC cores, i.e. do they all benefit from > loading/storing 8 instead of 4 bytes? As I recall there is an extra cycle between load and store, so you will benefit from doing all your loads first and then stores. The kernel memcpy has loads 16 bytes before storing. I selected 8 as uClibc should also be small. Since there has to be eieio between ops I am not sure it will matter here. Perhaps it is better to do 4 bytes in the main loop, making the whole function smaller. There are memset and memmove functions in uClibc too. Jocke _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev