On Wednesday 30 March 2005 05:27, Gerold Jury wrote: > > >> On Tue, Mar 29, 2005 at 05:37:06PM +0300, Denis Vlasenko wrote: > >> > /* > >> > * This looks horribly ugly, but the compiler can optimize it totally, > >> > * as the count is constant. > >> > */ > >> > static inline void * __constant_memcpy(void * to, const void * from, > >> > size_t n) { > >> > if (n <= 128) > >> > return __builtin_memcpy(to, from, n); > >> > >> The problem is that in GCC < 4.0 there is no constant propagation > >> pass before expanding builtin functions, so the __builtin_memcpy > >> call above sees a variable rather than a constant. > > > >or change "size_t n" to "const size_t n" will also fix the issue. > >As we do some (well very little and with inlining and const values) > >const progation before 4.0.0 on the trees before expanding the builtin. > > > >-- Pinski > >- > I used the following "const size_t n" change on x86_64 > and it reduced the memcpy count from 1088 to 609 with my setup and gcc 3.4.3. > (kernel 2.6.12-rc1, running now)
What do you mean, 'reduced'? (/me is checking....) Oh shit... It still emits half of memcpys, to be exact - for struct copies: arch/i386/kernel/process.c: int copy_thread(int nr, unsigned long clone_flags, unsigned long esp, unsigned long unused, struct task_struct * p, struct pt_regs * regs) { struct pt_regs * childregs; struct task_struct *tsk; int err; childregs = ((struct pt_regs *) (THREAD_SIZE + (unsigned long) p->thread_info)) - 1; *childregs = *regs; ^^^^^^^^^^^^^^^^^^^ childregs->eax = 0; childregs->esp = esp; # make arch/i386/kernel/process.s copy_thread: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $20, %esp movl 24(%ebp), %eax movl 4(%eax), %esi pushl $60 leal 8132(%esi), %ebx pushl 28(%ebp) pushl %ebx call memcpy <================= movl $0, 24(%ebx) movl 16(%ebp), %eax movl %eax, 52(%ebx) movl 24(%ebp), %edx addl $8192, %esi movl %ebx, 516(%edx) movl %esi, -32(%ebp) movl %esi, 504(%edx) movl $ret_from_fork, 512(%edx) Jakub, is there a way to instruct gcc to inine this copy, or better yet, to use user-supplied inline version of memcpy? -- vda