On Sun, 17 Mar 2002, Malcolm Kavalsky wrote:
> I forgot to mention that I am running Redhat 7.2 with kernel 2.4.14. My
> PIII has
> 384Mb dram, and runs at 667Mhz. The results are completely reproducible,
> without
> running anything in the background.
ok. obviously, we're doing something wrong with these measurements. lets
try to check a few more things:
1. how did you ocmpile the test program? give the command line you used. i
did 'gcc -O2 prog.c'. gcc is 2.91.66 - the same gcc was used to compile
both kernel, glibc and application. which compiler(s) did you use to
compile each of these?
2. CPUs - it is possible to take advantage of different assembly copy
routines in order to increase the copying speed. your CPUs are PIII.
i'm not aware of assembly insturctions on them which are faster then on
and AMD K6-2, which the optimizer might have used. or are there, and
they were used when compiling the kernel (read() and write() are using
the code compiled in the kernel, while memcpy uses the code compiled in
the appliction). thus, in order to make a fair comparison, both
application and kernel must be compiled using the same flags.
the kernel code (2.4.18) looks the same as the 2.4.4 code, so its not a
kernel change. what i see is that the copying routines go down to
mmx_copy_user and mmx_copy_user_zeroing, for chunks which are larger then
512 bytes (which are the chunks probably copied in our case). couldn't
find these functions in the sources, though. but this happens if
'CONFIG_X86_USE_3DNOW_AND_WORKS' is defined. otherwise, it uses
__copy_user and __copy_user_zeroing. so the question now is - when you
compiled your kernel, what CPU did you specify in the kernel's config?
> I realised afterwards that I should have measured the time also in the
> server, and attach
> the fixed program, which still gives the same results.
>
> I guess there have been some major improvements on the kernel code, running
> the same program on my laptop ( PIII 500 Mhz, 128MB Dram, kernel 2.4.18)
> results in:
>
> Memcpy'ed 2000 blocks of size 1048576 in 11 seconds => 181 Mbytes/second
> Sent 2000 blocks of size 1048576 in 5 seconds over unixsocket => 400
> Mbytes/second
> Received 2097152000 bytes in 5 seconds over unix socket =>400
> Mbytes/second
>
> Even though zero-copy is not being done, isn't it surprising how much
> faster it
> is to send data over a socket than just to copy it from one buffer to
> another ;)
don't you find it a bit odd, that it'll be faster with doing
double-copies, then when doing a single copy? or you tihnk virtual memory
somehow plays tricks on us? if so - why'd it be different?
--
guy
"For world domination - press 1,
or dial 0, and please hold, for the creator." -- nob o. dy
-- Attached file included as plaintext by Listar --
-- File: socket_vs_memcpy.c
#include <stdio.h>
#include <malloc.h>
#include <string.h>
#include <time.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define BUFSIZE 0x100000 /* 1 Megabyte */
#define NBLOCKS 2000
#define PORT_NAME "/tmp/foo"
void server()
{
struct sockaddr_un sin,from;
int s,g,len,n;
char *buf;
float nbytes;
time_t start_time, elapsed_time;
buf = malloc( BUFSIZE );
/* Create an unbound socket */
if( (s=socket( PF_UNIX, SOCK_STREAM, 0 )) < 0 ){
printf( "Bad socket\n");
return;
}
strcpy( sin.sun_path, PORT_NAME );
sin.sun_family = PF_UNIX;
if( bind( s, (struct sockaddr *)&sin,
strlen(sin.sun_path) + sizeof(sin.sun_family)) < 0){
printf( "Bad bind\n");
return;
}
listen( s, 5 );
len = sizeof(from);
g = accept( s, (struct sockaddr *)&from, &len );
nbytes = read( g, buf, BUFSIZE );
start_time = time(0);
while( (n = read( g, buf, BUFSIZE )) > 0 ) {
nbytes += n;
}
elapsed_time = time(0) - start_time;
close(g);
close(s);
unlink( PORT_NAME );
printf( "Received %10.0f bytes in %d seconds over unix socket =>",
nbytes, (int)elapsed_time );
printf( " %4.0f Mbytes/second \n", nbytes / (0x100000 * elapsed_time) );
}
void client()
{
struct sockaddr_un sin;
int s;
char *buf;
time_t start_time, elapsed_time;
int i;
buf = malloc( BUFSIZE );
if( (s=socket( PF_UNIX, SOCK_STREAM, 0 )) < 0 ){
printf( "Bad socket\n");
return;
}
strcpy( sin.sun_path, PORT_NAME );
sin.sun_family = PF_UNIX;
if( connect( s, (struct sockaddr *)&sin, sizeof(sin)) < 0 ){
printf("Bad connect\n");
close(s);
return;
}
start_time = time(0);
for( i=0; i< NBLOCKS && write(s, buf, BUFSIZE) == BUFSIZE ; i++ );
elapsed_time = time(0) - start_time;
close(s);
printf( "Sent %d blocks of size %d in %d seconds over unix socket =>",
i, BUFSIZE, (int)elapsed_time );
printf( " %d Mbytes/second \n", (NBLOCKS * BUFSIZE) / (0x100000 * (int)elapsed_time)
);
}
void memcpy_benchmark()
{
char *src, *dst;
time_t start_time, elapsed_time;
int i;
src = malloc ( BUFSIZE );
dst = malloc ( BUFSIZE );
start_time = time(0);
for( i=0; i< NBLOCKS; i++ )
memcpy( dst, src, BUFSIZE );
elapsed_time = time(0) - start_time;
printf( "Memcpy'ed %d blocks of size %d in %d seconds =>",
NBLOCKS, BUFSIZE, (int)elapsed_time );
printf( " %d Mbytes/second\n", (NBLOCKS * BUFSIZE) / (0x100000 * (int)elapsed_time)
);
}
void socket_benchmark()
{
int status;
if ( fork() == 0 ) {
server();
} else {
sleep(1); /* Dirty, but ensures client runs after server is ready */
client();
}
wait(&status);
}
int main()
{
memcpy_benchmark();
socket_benchmark();
return 0;
}
=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]