guy keren wrote:

>
>On Sun, 17 Mar 2002, Malcolm Kavalsky wrote:
>
>>I forgot to mention that I am running Redhat 7.2 with kernel 2.4.14. My
>>PIII has
>>384Mb dram, and runs at 667Mhz. The results are completely reproducible,
>>without
>>running anything in the background.
>>
>
>ok. obviously, we're doing something wrong with these measurements. lets
>try to check a few more things:
>
>1. how did you ocmpile the test program? give the command line you used. i
>   did 'gcc -O2 prog.c'. gcc is 2.91.66 - the same gcc was used to compile
>   both kernel, glibc and application. which compiler(s) did you use to
>   compile each of these?
>
I have compiled using kgcc, and gcc, with optimization O2, and without, and
see no difference.

kgcc version : gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
gcc version:    gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)

I didn't recompile glibc.

>
>2. CPUs - it is possible to take advantage of different assembly copy
>   routines in order to increase the copying speed. your CPUs are PIII.
>   i'm not aware of assembly insturctions on them which are faster then on
>   and AMD K6-2, which the optimizer might have used. or are there, and
>   they were used when compiling the kernel (read() and write() are using
>   the code compiled in the kernel, while memcpy uses the code compiled in
>   the appliction). thus, in order to make a fair comparison, both
>   application and kernel must be compiled using the same flags.
>
Ok. So, I'll try using kernel flags on the application.

>
>the kernel code (2.4.18) looks the same as the 2.4.4 code, so its not a
>kernel change. what i see is that the copying routines go down to
>mmx_copy_user and mmx_copy_user_zeroing, for chunks which are larger then
>512 bytes (which are the chunks probably copied in our case). couldn't
>find these functions in the sources, though. but this happens if
>'CONFIG_X86_USE_3DNOW_AND_WORKS' is defined. otherwise, it uses
>__copy_user and __copy_user_zeroing. so the question now is - when you
>compiled your kernel, what CPU did you specify in the kernel's config?
>
PIII Coppermine

>
>>I realised afterwards that I should have measured the time also in the
>>server, and attach
>>the fixed program, which still gives the same results.
>>
>>I guess there have been some major improvements on the kernel code, running
>>the same program on my laptop ( PIII 500 Mhz, 128MB Dram, kernel 2.4.18)
>>results in:
>>
>>Memcpy'ed 2000 blocks of size 1048576 in 11 seconds => 181 Mbytes/second
>>Sent 2000 blocks of size 1048576 in 5 seconds over unixsocket => 400
>>Mbytes/second
>>Received 2097152000 bytes in 5 seconds over unix socket =>400
>>Mbytes/second
>>
>>Even though zero-copy is not being done, isn't it surprising how much
>>faster it
>>is to send data over a socket than just to copy it from one buffer to
>>another ;)
>>
>
>don't you find it a bit odd, that it'll be faster with doing
>double-copies, then when doing a single copy? or you tihnk virtual memory
>somehow plays tricks on us? if so - why'd it be different?
>
I asked one of the top Unix hackers that I know, and he said:

"I would guess that if you do large af_unix transfers that are page
aligned then the system doesn't have to actually copy the data rather it
can share the page and do a copy on write. This preserves the socket
semantics and can be faster than memcpy. This was done many years ago in
Solaris."

I wonder if digging deep enough in the kernel sources, will reveal this ... 




_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to