Re: [patch v2] epoll use a single inode ...

Eric Dumazet Tue, 06 Mar 2007 10:13:08 -0800

On Tuesday 06 March 2007 18:28, Eric Dumazet wrote:
> On Tuesday 06 March 2007 18:19, Linus Torvalds wrote:
>
> > > Using reciprocal divides permits to change each divide by two
> > > multiplies, less expensive on current CPUS.
> >
> > Are you sure?
>
> I am going to test this, but at least on Opterons, the reciprocal divide I
> added into mm/slab.c gave me a nice speedup.
>


With attached test program (one million calls to pipe()), I got about 0.1 s of 
speedup on my 1.6 GHz Pentium(M) dell D610 machine
(3.350 s instead of 3.450 s, on many runs)

Thats about 100 ns saved per number() call

But then I realized that on ia32, gcc compilers is not very smart :

static inline u32 reciprocal_divide(u32 A, u32 R)
{
        return (u32)(((u64)A * R) >> 32); 
}

It generates two multiplies... arg...

//begin of reciprocal_divide()
     4b0:       8b 4c 24 28             mov    0x28(%esp),%ecx
     4b4:       89 f0                   mov    %esi,%eax
     4b6:       f7 64 24 24             mull   0x24(%esp)
     4ba:       0f af ce                imul   %esi,%ecx
     4bd:       8d 14 11                lea    (%ecx,%edx,1),%edx
// end of reciprocal_divide()
     4c0:       8b 8c 24 8c 00 00 00    mov    0x8c(%esp),%ecx
     4c7:       89 d0                   mov    %edx,%eax
     4c9:       0f af c8                imul   %eax,%ecx
     4cc:       29 ce                   sub    %ecx,%esi
     4ce:       8b 4c 24 1c             mov    0x1c(%esp),%ecx
     4d2:       0f b6 34 31             movzbl (%ecx,%esi,1),%esi
     4d6:       89 f1                   mov    %esi,%ecx
     4d8:       89 c6                   mov    %eax,%esi
     4da:       88 0f                   mov    %cl,(%edi)
     4dc:       47                      inc    %edi
     4dd:       ff 44 24 20             incl   0x20(%esp)
     4e1:       85 c0                   test   %eax,%eax
     4e3:       75 cb                   jne    4b0 <number+0x160>

So even with a total of 3 multiplies per digit, we win...

Maybe some bit of x86 asm is needed to make gcc be smarter (using only one 
multiply for  reciprocal_divide())

/*
 * micro benchmark to time calls to pipe()/close()
 */
main()
{
        int fd[100*2];
        unsigned int l, i;

        for (l = 0 ; l < 10000 ; l++) {
                for (i = 0 ; i < 100*2 ; i+=2)
                        pipe(fd + i);
                for (i = 0 ; i < 100*2 ; i++)
                        close(fd[i]);
        }
}

Re: [patch v2] epoll use a single inode ...

Reply via email to