guy keren wrote: > >On Sun, 17 Mar 2002, Malcolm Kavalsky wrote: > >>I forgot to mention that I am running Redhat 7.2 with kernel 2.4.14. My >>PIII has >>384Mb dram, and runs at 667Mhz. The results are completely reproducible, >>without >>running anything in the background. >> > >ok. obviously, we're doing something wrong with these measurements. lets >try to check a few more things: > >1. how did you ocmpile the test program? give the command line you used. i > did 'gcc -O2 prog.c'. gcc is 2.91.66 - the same gcc was used to compile > both kernel, glibc and application. which compiler(s) did you use to > compile each of these? > I have compiled using kgcc, and gcc, with optimization O2, and without, and see no difference.
kgcc version : gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) gcc version: gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98) I didn't recompile glibc. > >2. CPUs - it is possible to take advantage of different assembly copy > routines in order to increase the copying speed. your CPUs are PIII. > i'm not aware of assembly insturctions on them which are faster then on > and AMD K6-2, which the optimizer might have used. or are there, and > they were used when compiling the kernel (read() and write() are using > the code compiled in the kernel, while memcpy uses the code compiled in > the appliction). thus, in order to make a fair comparison, both > application and kernel must be compiled using the same flags. > Ok. So, I'll try using kernel flags on the application. > >the kernel code (2.4.18) looks the same as the 2.4.4 code, so its not a >kernel change. what i see is that the copying routines go down to >mmx_copy_user and mmx_copy_user_zeroing, for chunks which are larger then >512 bytes (which are the chunks probably copied in our case). couldn't >find these functions in the sources, though. but this happens if >'CONFIG_X86_USE_3DNOW_AND_WORKS' is defined. otherwise, it uses >__copy_user and __copy_user_zeroing. so the question now is - when you >compiled your kernel, what CPU did you specify in the kernel's config? > PIII Coppermine > >>I realised afterwards that I should have measured the time also in the >>server, and attach >>the fixed program, which still gives the same results. >> >>I guess there have been some major improvements on the kernel code, running >>the same program on my laptop ( PIII 500 Mhz, 128MB Dram, kernel 2.4.18) >>results in: >> >>Memcpy'ed 2000 blocks of size 1048576 in 11 seconds => 181 Mbytes/second >>Sent 2000 blocks of size 1048576 in 5 seconds over unixsocket => 400 >>Mbytes/second >>Received 2097152000 bytes in 5 seconds over unix socket =>400 >>Mbytes/second >> >>Even though zero-copy is not being done, isn't it surprising how much >>faster it >>is to send data over a socket than just to copy it from one buffer to >>another ;) >> > >don't you find it a bit odd, that it'll be faster with doing >double-copies, then when doing a single copy? or you tihnk virtual memory >somehow plays tricks on us? if so - why'd it be different? > I asked one of the top Unix hackers that I know, and he said: "I would guess that if you do large af_unix transfers that are page aligned then the system doesn't have to actually copy the data rather it can share the page and do a copy on write. This preserves the socket semantics and can be faster than memcpy. This was done many years ago in Solaris." I wonder if digging deep enough in the kernel sources, will reveal this ... _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]