Re: Memory allocation performance

2008-02-03 Thread Dag-Erling Smørgrav
Julian Elischer <[EMAIL PROTECTED]> writes:
> Robert Watson <[EMAIL PROTECTED]> writes:
> > be a good time to try to revalidate that.  Basically, the goal would
> > be to make the pcpu cache FIFO as much as possible as that maximizes
> > the chances that the newly allocated object already has lines in the
> > cache.  It's a fairly trivial tweak to the UMA allocation code.
> you mean FILO or LIFO right?

Uh, no.  You want to reuse the last-freed object, as it is most likely
to still be in cache.

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


sort(1) memory usage

2008-02-03 Thread Dag-Erling Smørgrav
I've been trying to figure out why some periodic scripts consume so much
memory.  I've narrowed it down to sort(1).

At first, I thought the scripts were using it inefficiently, feeding it
more data than was really needed.  Then I discovered this:

[EMAIL PROTECTED] ~% (sleep 10 | sort) & (sleep 5 ; top -o res | grep sort)
[1] 66024
66024 des  1  -85 54796K 52680K piperd 1   0:00  0.88% sort

That's right - sort(1) consumes 50+ MB of memory doing *nothing*.

(roughly half that on a 32-bit box)

Something is rotten in the state of GNU...

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sort(1) memory usage

2008-02-03 Thread Ed Schouten
* Dag-Erling Smørgrav <[EMAIL PROTECTED]> wrote:
> I've been trying to figure out why some periodic scripts consume so much
> memory.  I've narrowed it down to sort(1).
> 
> At first, I thought the scripts were using it inefficiently, feeding it
> more data than was really needed.  Then I discovered this:
> 
> [EMAIL PROTECTED] ~% (sleep 10 | sort) & (sleep 5 ; top -o res | grep sort)
> [1] 66024
> 66024 des  1  -85 54796K 52680K piperd 1   0:00  0.88% sort
> 
> That's right - sort(1) consumes 50+ MB of memory doing *nothing*.
> 
> (roughly half that on a 32-bit box)
> 
> Something is rotten in the state of GNU...

On my i386 box it spends 27M, but when I replace `sort' with `sed',
without any arguments, it's only 1.4 MB. I tried this on RELENG_6. I can
also reproduce this on Linux.

-- 
 Ed Schouten <[EMAIL PROTECTED]>
 WWW: http://g-rave.nl/


pgpCYWNtuzaHI.pgp
Description: PGP signature


getaddrinfo() spec doesn't match behaviour

2008-02-03 Thread Heiko Wundram (Beenic)
Hey all!

Before I go post this as a PR (or go about fixing the libc code), I just 
wanted to ask whether this is a known issue (and I simply haven't been able 
to find it), or if it's simply my stupidity that makes this fail.

Basically, I have the following code:

addrinfo hints;
addrinfo* res;

// Fill in hints structure.
memset(&hints,0,sizeof(hints));
hints.ai_flags = AI_PASSIVE | AI_V4MAPPED;
hints.ai_family = SERV.m_ipv6 ? AF_INET6 : AF_INET;
hints.ai_socktype = SOCK_STREAM;
hints.ai_protocol = IPPROTO_TCP;

// Query address info.
if( ( rv = getaddrinfo(bindaddr,port,&hints,&res) ) ) {
if( rv == EAI_SYSTEM )
throw OSError("Failed resolving service and/or bind address");
throw ValueError(gai_strerror(rv));
}

Now, according to the man-page of getaddrinfo() and my previous knowledge, 
this should work fine (for some bindaddr and port coming into this function 
as const char*), but, alas, it doesn't, and fails with EAI_BADFLAGS.

Tracing through the libc code for getaddrinfo(), the cause of this failing is 
pretty obvious:

hints.ai_flags is logically anded with AI_MASK at the beginning of the 
function, and AI_MASK (at least in my local netdb.h header) does not contain 
the flag AI_V4MAPPED. In case the result of that is non-zero (which it is due 
to me specifying AI_V4MAPPED), the function returns EAI_BADFLAGS.

After that, getaddrinfo() does some checks on AI_V4MAPPED and AI_ALL (masking 
the respective flags in case the ai_family isn't AF_INET6), which are 
basically superfluous (as they can never be set anyway due to AI_MASK), but I 
didn't find any other reference to the flag in the whole of 
lib/libc/net/getaddrinfo.c, so I actually don't know whether the cause of 
this failing is that it simply isn't implemented (completely), or there's any 
other reason for AI_MASK not to contain these flags that I haven't grasped so 
far.

If anyone out there can shed a hint on this, I'd be grateful, even if it's 
just the fact that my netdb.h installation is broken.

Thanks!

-- 
Heiko Wundram
Product & Application Development
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sort(1) memory usage

2008-02-03 Thread Dag-Erling Smørgrav
Erik Trulsson <[EMAIL PROTECTED]> writes:
> Yep, it seems that GNU sort allocates a quite large buffer by default when
> the size of the input is unknown (such as when it reads input from stdin.)
> A quick check in the source code indicates that it tries to size this buffer
> according to how much memory the system has (and according to any limits set
> on how much memory the process is allowed to use.)

Uh, OK.  This scaling doesn't seem to work correctly.  It seems to
allocate 27 MB on 32-bit machines and 54 MB on 64-bit machines,
regardless of memory size.

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sort(1) memory usage

2008-02-03 Thread Erik Trulsson
On Sun, Feb 03, 2008 at 02:13:22PM +0100, Ed Schouten wrote:
> * Dag-Erling Smørgrav <[EMAIL PROTECTED]> wrote:
> > I've been trying to figure out why some periodic scripts consume so much
> > memory.  I've narrowed it down to sort(1).
> > 
> > At first, I thought the scripts were using it inefficiently, feeding it
> > more data than was really needed.  Then I discovered this:
> > 
> > [EMAIL PROTECTED] ~% (sleep 10 | sort) & (sleep 5 ; top -o res | grep sort)
> > [1] 66024
> > 66024 des  1  -85 54796K 52680K piperd 1   0:00  0.88% sort
> > 
> > That's right - sort(1) consumes 50+ MB of memory doing *nothing*.
> > 
> > (roughly half that on a 32-bit box)
> > 
> > Something is rotten in the state of GNU...
> 
> On my i386 box it spends 27M, but when I replace `sort' with `sed',
> without any arguments, it's only 1.4 MB. I tried this on RELENG_6. I can
> also reproduce this on Linux.
> 

Yep, it seems that GNU sort allocates a quite large buffer by default when
the size of the input is unknown (such as when it reads input from stdin.)
A quick check in the source code indicates that it tries to size this buffer
according to how much memory the system has (and according to any limits set
on how much memory the process is allowed to use.)
The size of this buffer can be controlled with the --buffer-size option to
sort(1).



-- 

Erik Trulsson
[EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sort(1) memory usage

2008-02-03 Thread Dag-Erling Smørgrav
Dag-Erling Smørgrav <[EMAIL PROTECTED]> writes:
> Erik Trulsson <[EMAIL PROTECTED]> writes:
> > Yep, it seems that GNU sort allocates a quite large buffer by default when
> > the size of the input is unknown (such as when it reads input from stdin.)
> > A quick check in the source code indicates that it tries to size this buffer
> > according to how much memory the system has (and according to any limits set
> > on how much memory the process is allowed to use.)
> Uh, OK.  This scaling doesn't seem to work correctly.  It seems to
> allocate 27 MB on 32-bit machines and 54 MB on 64-bit machines,
> regardless of memory size.

Looking at the code, it seems to go to extreme lengths to get it
absolutely wrong.  For instance, if hw.physmem / 8 > hw.usermem, it will
pick the former, which means it's pretty much guaranteed to either fail
or hose your system (or both).

In the immortal words of Blazing Star: YOU FAIL IT

Count this as a vote for ditching GNU sort in favor of a BSD-licensed
implementation (from {Net,Open}BSD for instance).

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sort(1) memory usage

2008-02-03 Thread Wayne Sierke
On Sun, 2008-02-03 at 14:13 +0100, Ed Schouten wrote:
> * Dag-Erling Smørgrav <[EMAIL PROTECTED]> wrote:
> > I've been trying to figure out why some periodic scripts consume so much
> > memory.  I've narrowed it down to sort(1).
> > 
> > At first, I thought the scripts were using it inefficiently, feeding it
> > more data than was really needed.  Then I discovered this:
> > 
> > [EMAIL PROTECTED] ~% (sleep 10 | sort) & (sleep 5 ; top -o res | grep sort)
> > [1] 66024
> > 66024 des  1  -85 54796K 52680K piperd 1   0:00  0.88% sort
> > 
> > That's right - sort(1) consumes 50+ MB of memory doing *nothing*.
> > 
> > (roughly half that on a 32-bit box)
> > 
> > Something is rotten in the state of GNU...
> 
> On my i386 box it spends 27M, but when I replace `sort' with `sed',
> without any arguments, it's only 1.4 MB. I tried this on RELENG_6. I can
> also reproduce this on Linux.
> 

%uname -vm
FreeBSD 7.0-PRERELEASE #1: Fri Jan 25 01:08:47 CST 2008 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/GENERIC-KTR-0x2000  i386
%( sleep 10 | sort ) & ( sleep 5 ; top -n 150 | grep sort )
[2] 38158
38158 ws1  -80 29760K   736K piperd   0:00  0.00% sort
%su -
# ( sleep 10 | sort ) & ( sleep 5 ; top -n 150 | grep sort )
[2] 38165
38165 root  1  -80 29760K   732K piperd   0:00  0.00% sort


$ uname -vm
FreeBSD 6.3-PRERELEASE #1: Fri Dec 28 17:49:43 CST 2007 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/LILLITH-IV  i386
$ (sleep 10 | sort) & (sleep 5; top -n 150 | grep sort)
68953 ws   1  -80 26988K   660K piperd   0:00  0.00% sort
$ su -
#  (sleep 10 | sort) & (sleep 5; top -n 150 | grep sort)
[1] 68981
68981 root 1  -80 26988K   660K piperd   0:00  0.00% sort


Next one is Ubuntu 7.04

$ uname -a
Linux developer 2.6.20-16-generic #2 SMP Thu Jun 7 20:19:32 UTC 2007 i686 
GNU/Linux
$ (sleep 10 | sort) & (sleep 5; ps aux | grep -E "^USER|sort$")
[9] 10523
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
ws   10526  0.0  0.0  28436   676 pts/4S01:29   0:00 sort


(I had to change the 'top' incantation to pick up the 'sort' process on
these busy boxes.)


Wayne

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sort(1) memory usage

2008-02-03 Thread Erik Trulsson
On Sun, Feb 03, 2008 at 04:31:34PM +0100, Dag-Erling Smørgrav wrote:
> Dag-Erling Smørgrav <[EMAIL PROTECTED]> writes:
> > Erik Trulsson <[EMAIL PROTECTED]> writes:
> > > Yep, it seems that GNU sort allocates a quite large buffer by default when
> > > the size of the input is unknown (such as when it reads input from stdin.)
> > > A quick check in the source code indicates that it tries to size this 
> > > buffer
> > > according to how much memory the system has (and according to any limits 
> > > set
> > > on how much memory the process is allowed to use.)
> >
> > Uh, OK.  This scaling doesn't seem to work correctly.  It seems to
> > allocate 27 MB on 32-bit machines and 54 MB on 64-bit machines,
> > regardless of memory size.

I said it *tries* to the size the buffer according the amount of memory
available.  I didn't say it succeded in doing so, or that it even made a
good attempty at it.

Those 27MB/54MB is probably because it hits some kind of limit.
On a machine having only 64MB RAM, sort(1) "only" allocated 21MB adress space.

I suspect the scaling algorithm was designed for older machines which
rarely, if ever, had more than maybe 64MB RAM (and usually less than that),
and that little thought was given to multi-gigabyte machines like those
common today.


> 
> Looking at the code, it seems to go to extreme lengths to get it
> absolutely wrong.  For instance, if hw.physmem / 8 > hw.usermem, it will
> pick the former, which means it's pretty much guaranteed to either fail
> or hose your system (or both).
> 
> In the immortal words of Blazing Star: YOU FAIL IT
> 
> Count this as a vote for ditching GNU sort in favor of a BSD-licensed
> implementation (from {Net,Open}BSD for instance).
> 

If any such implementation was a true drop-in replacement of GNU sort
(supporting all the same options etc.) and did not have noticably worse
performance, then I certainly would not raise any objections to that.



-- 

Erik Trulsson
[EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sort(1) memory usage

2008-02-03 Thread Ed Schouten
* Dag-Erling Smørgrav <[EMAIL PROTECTED]> wrote:
> Count this as a vote for ditching GNU sort in favor of a BSD-licensed
> implementation (from {Net,Open}BSD for instance).

I just looked at the OpenBSD implementation and I can see it already
misses one option that some people will miss, namely numeric sorting
(-g). I don't know anything about NetBSD's implementation or how hard it
is to add.

-- 
 Ed Schouten <[EMAIL PROTECTED]>
 WWW: http://g-rave.nl/


pgpPh9FyQ7a2f.pgp
Description: PGP signature


Re: getaddrinfo() spec doesn't match behaviour

2008-02-03 Thread Hajimu UMEMOTO
Hi,

> On Sun, 3 Feb 2008 14:50:18 +0100
> "Heiko Wundram (Beenic)" <[EMAIL PROTECTED]> said:

wundram> hints.ai_flags is logically anded with AI_MASK at the beginning of the 
wundram> function, and AI_MASK (at least in my local netdb.h header) does not 
contain 
wundram> the flag AI_V4MAPPED. In case the result of that is non-zero (which it 
is due 
wundram> to me specifying AI_V4MAPPED), the function returns EAI_BADFLAGS.

wundram> After that, getaddrinfo() does some checks on AI_V4MAPPED and AI_ALL 
(masking 
wundram> the respective flags in case the ai_family isn't AF_INET6), which are 
wundram> basically superfluous (as they can never be set anyway due to 
AI_MASK), but I 
wundram> didn't find any other reference to the flag in the whole of 
wundram> lib/libc/net/getaddrinfo.c, so I actually don't know whether the cause 
of 
wundram> this failing is that it simply isn't implemented (completely), or 
there's any 
wundram> other reason for AI_MASK not to contain these flags that I haven't 
grasped so 
wundram> far.

Since the part is incomplete support of AI_ALL and AI_V4MAPPED, I've
just nuked it.

http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/net/getaddrinfo.c#rev1.87

Sincerely,

--
Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan
[EMAIL PROTECTED]  [EMAIL PROTECTED],jp.}FreeBSD.org
http://www.imasy.org/~ume/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: gettimeofday() in hping

2008-02-03 Thread Kris Kennaway

Stefan Lambrev wrote:


I run from host A : hping --flood -p 22 -S 10.3.3.2
and systat -ifstat on host B to see the traffic that is generated
(I do not want to run this monitoring on the flooder host as it will 
effect his performance)


OK, I finally got time to look at this.  Firstly, this is quite an 
inefficient program.  It performs 5 syscalls for each packet that it sends:


  2391 initial thread CALL  sendto(0x3,0x61b050,0x28,0,0x5232c0,0x10)
  2391 initial thread GIO   fd 3 wrote 40 bytes
   0x 4500 2800 7491  4006  0a00 0004 0a00 0001 3a96 
0016 1865 a781 39d8 12aa 5002 0200 52c9 
|E.([EMAIL PROTECTED]:e..9...P...R.|
   0x0026  
   |..|


  2391 initial thread RET   sendto 40/0x28
  2391 initial thread CALL 
sigaction(SIGALRM,0x7fffe6b0,0x7fffe690)

  2391 initial thread RET   sigaction 0
  2391 initial thread CALL  setitimer(0,0x7fffe6c0,0x7fffe6a0)
  2391 initial thread RET   setitimer 0
  2391 initial thread CALL  gettimeofday(0x7fffe680,0)
  2391 initial thread RET   gettimeofday 0
  2391 initial thread CALL  gettimeofday(0x7fffe680,0)
  2391 initial thread RET   gettimeofday 0

Here is a further litany of some of the ways in which this software is 
terrible:


* It does not attempt to increase the socket buffer size (as we have 
already discussed), but


* It also doesn't cope with the possibility that the packet may not be 
sent because the send buffer is full.


* With every packet sent in flood mode it sets a timer for 1 second in 
the future even though we have told it not to send packets once a second 
but as fast as possible


* We also set the signal handler with each packet sent, instead of 
setting it once and leaving it.


* We call gettimeofday twice for each packet, once to retrieve the 
second timestamp and once to retrieve the microseconds.  This is only 
for the purpose of computing the RTT.  However, we can only keep track 
of 400 in-flight packets, which means this is also useless in flood mode.


* The suspend handler does not work

* This does not strike me as quality software :)

Fixing all of the above I can send at about 13MB/sec (timecounter is not 
relevant any more).  The CPU is spending about 75% of the time in the 
kernel, so


Kris

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: gettimeofday() in hping

2008-02-03 Thread Kris Kennaway

Kris Kennaway wrote:

Stefan Lambrev wrote:


I run from host A : hping --flood -p 22 -S 10.3.3.2
and systat -ifstat on host B to see the traffic that is generated
(I do not want to run this monitoring on the flooder host as it will 
effect his performance)


OK, I finally got time to look at this.  Firstly, this is quite an 
inefficient program.  It performs 5 syscalls for each packet that it sends:


  2391 initial thread CALL  sendto(0x3,0x61b050,0x28,0,0x5232c0,0x10)
  2391 initial thread GIO   fd 3 wrote 40 bytes
   0x 4500 2800 7491  4006  0a00 0004 0a00 0001 3a96 
0016 1865 a781 39d8 12aa 5002 0200 52c9 
|E.([EMAIL PROTECTED]:e..9...P...R.|

   0x0026 |..|

  2391 initial thread RET   sendto 40/0x28
  2391 initial thread CALL sigaction(SIGALRM,0x7fffe6b0,0x7fffe690)
  2391 initial thread RET   sigaction 0
  2391 initial thread CALL  setitimer(0,0x7fffe6c0,0x7fffe6a0)
  2391 initial thread RET   setitimer 0
  2391 initial thread CALL  gettimeofday(0x7fffe680,0)
  2391 initial thread RET   gettimeofday 0
  2391 initial thread CALL  gettimeofday(0x7fffe680,0)
  2391 initial thread RET   gettimeofday 0

Here is a further litany of some of the ways in which this software is 
terrible:


* It does not attempt to increase the socket buffer size (as we have 
already discussed), but


* It also doesn't cope with the possibility that the packet may not be 
sent because the send buffer is full.


* With every packet sent in flood mode it sets a timer for 1 second in 
the future even though we have told it not to send packets once a second 
but as fast as possible


* We also set the signal handler with each packet sent, instead of 
setting it once and leaving it.


* We call gettimeofday twice for each packet, once to retrieve the 
second timestamp and once to retrieve the microseconds.  This is only 
for the purpose of computing the RTT.  However, we can only keep track 
of 400 in-flight packets, which means this is also useless in flood mode.


* The suspend handler does not work

* This does not strike me as quality software :)

Fixing all of the above I can send at about 13MB/sec (timecounter is not 
relevant any more).  The CPU is spending about 75% of the time in the 
kernel, so

 that is the next place to look. [hit send too soon]

Kris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: gettimeofday() in hping

2008-02-03 Thread Kris Kennaway

Kris Kennaway wrote:

Fixing all of the above I can send at about 13MB/sec (timecounter is 
not relevant any more).  The CPU is spending about 75% of the time in 
the kernel, so

 that is the next place to look. [hit send too soon]


Actually 15MB/sec once I disable all kernel debugging.  This is 
identical to Linux 2.6.24 on the same hardware.  The patch I use to fix 
hping brain-damage is attached.


Kris



diff -ru work/hping3-20051105/opensockraw.c work~/hping3-20051105/opensockraw.c
--- opensockraw.c.orig  2003-09-01 00:22:06.0 +
+++ opensockraw.c   2008-02-03 19:45:28.0 +
@@ -17,7 +17,7 @@
 
 int open_sockraw()
 {
-   int s;
+   int s, t, val;
 
s = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
if (s == -1) {
@@ -25,5 +25,12 @@
return -1;
}
 
+   val = 262144;
+   t = setsockopt(s, SOL_SOCKET, SO_SNDBUF, &val, sizeof(val));
+   if (t == -1) {
+   perror("[open_sockraw] setsockopt()");
+   return -1;
+   }
+
return s;
 }
diff -ru work/hping3-20051105/sendip.c work~/hping3-20051105/sendip.c
--- sendip.c.orig   2004-04-09 23:38:56.0 +
+++ sendip.c2008-02-03 19:50:35.0 +
@@ -110,7 +110,7 @@
result = sendto(sockraw, packet, packetsize, 0,
(struct sockaddr*)&remote, sizeof(remote));

-   if (result == -1 && errno != EINTR && !opt_rand_dest && 
!opt_rand_source) {
+   if (result == -1 && errno != ENOBUFS && errno != EINTR && 
!opt_rand_dest && !opt_rand_source) {
perror("[send_ip] sendto");
if (close(sockraw) == -1)
perror("[ipsender] close(sockraw)");
diff -ru work/hping3-20051105/sendtcp.c work~/hping3-20051105/sendtcp.c
--- sendtcp.c.orig  2003-09-01 00:22:06.0 +
+++ sendtcp.c   2008-02-03 20:30:51.0 +
@@ -85,8 +85,10 @@
  packet_size);
 #endif
 
+#if 0
/* adds this pkt in delaytable */
delaytable_add(sequence, src_port, time(NULL), get_usec(), S_SENT);
+#endif
 
/* send packet */
send_ip_handler(packet+PSEUDOHDR_SIZE, packet_size);
--- send.c.orig 2003-08-31 17:23:53.0 +
+++ send.c  2008-02-03 21:58:59.0 +
@@ -63,6 +63,8 @@
}
 }
 
+static int sigalarm_handler = 0;
+
 /* The signal handler for SIGALRM will send the packets */
 void send_packet (int signal_id)
 {
@@ -79,12 +81,15 @@
elsesend_tcp();
 
sent_pkt++;
-   Signal(SIGALRM, send_packet);
+   if (!opt_flood && !sigalarm_handler) {
+   Signal(SIGALRM, send_packet);
+   sigalarm_handler = 1;
+   }
 
if (count != -1 && count == sent_pkt) { /* count reached? */
Signal(SIGALRM, print_statistics);
alarm(COUNTREACHED_TIMEOUT);
-   } else if (!opt_listenmode) {
+   } else if (!opt_listenmode && !opt_flood) {
if (opt_waitinusec == FALSE)
alarm(sending_wait);
else
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Netgraph, usleep, and moving to Kernel

2008-02-03 Thread Len Gross
I've had some good success with implementing a custom MAC protocol using
Netgraph.   The current implementation runs in userland, connects to the
Kernel iface and kernel ethernet nodes.  It uses a polling loop with
usleep,  All very cool.  This is just background as the question really has
to do with usleeop in userland or kernel mode.

In testing with nothing else running on the box. it  seems to reveal that if
I ask for 100 usec of "sleep time," I get about 1000 usec (1 msec).  I have
set hz to 10,000 in kern.clockrate and Ticks is at 100.

If I set hz to 20,000 there is no improvement and Ticks seems to "reset
itsefl" to 50.  Box is an old Pentium with about 500 Mhz clock.  I also get
this behaviour on "test programs" that just call usleep.

1) Is there a way to get finer timing with usleep or an equivalent call?

2) When I move the userland into a "true" Netgraph Kernel node will I see
different behaviour?

Thanks in advance.

-- Len
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Memory allocation performance

2008-02-03 Thread Julian Elischer

Dag-Erling Smørgrav wrote:

Julian Elischer <[EMAIL PROTECTED]> writes:

Robert Watson <[EMAIL PROTECTED]> writes:

be a good time to try to revalidate that.  Basically, the goal would
be to make the pcpu cache FIFO as much as possible as that maximizes
the chances that the newly allocated object already has lines in the
cache.  It's a fairly trivial tweak to the UMA allocation code.

you mean FILO or LIFO right?


Uh, no.  You want to reuse the last-freed object, as it is most likely
to still be in cache.

DES



exactly.. FILO or LIFO (last in First out.)

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Memory allocation performance

2008-02-03 Thread Alexander Motin

Kris Kennaway wrote:
You can look at the raw output from pmcstat, which is a collection of 
instruction pointers that you can feed to e.g. addr2line to find out 
exactly where in those functions the events are occurring.  This will 
often help to track down the precise causes.


Thanks to the hint, it was interesting hunting, but it shown nothing. It 
hits into very simple lines like:

bucket = cache->uc_freebucket;
cache->uc_allocs++;
if (zone->uz_ctor != NULL) {
cache->uc_frees++;
and so on.
There is no loops, there is no inlines or macroses. Nothing! And the 
only hint about it is a huge number of "p4-resource-stall"s in those 
lines. I have no idea what exactly does it means, why does it happens 
mostly here and how to fight it.


I would probably agreed that it might be some profiler fluctuation, but 
performance benefits I have got from self-made uma calls caching look 
very real. :(


Robert Watson wrote:
> There was, FYI, a report a few years ago that there was a measurable
> improvement from allocating off the free bucket rather than maintaining
> separate alloc and free buckets.  It sounded good at the time but I was
> never able to reproduce the benefits in my test environment.  Now might
> be a good time to try to revalidate that.  Basically, the goal would be
> to make the pcpu cache FIFO as much as possible as that maximizes the
> chances that the newly allocated object already has lines in the cache.
> It's a fairly trivial tweak to the UMA allocation code.

I have tried this, but have not found a difference. May be it gives some 
benefits, but not in this situation. In this situation profiling shows 
delays in allocator itself, so as soon as allocator does not touches 
data objects itself it probably more speaks about management structure's 
memory caching then about objects caching.


I have got one more crazy idea that memory containing zones may have 
some special hardware or configuration features, like "noncaching" or 
something alike. That could explain slowdown in accessing it. But as I 
can't prove it, it just one more crazy theory. :(


--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: gettimeofday() in hping

2008-02-03 Thread Sam Leffler

Kris Kennaway wrote:

Stefan Lambrev wrote:


I run from host A : hping --flood -p 22 -S 10.3.3.2
and systat -ifstat on host B to see the traffic that is generated
(I do not want to run this monitoring on the flooder host as it will 
effect his performance)


OK, I finally got time to look at this.  Firstly, this is quite an 
inefficient program.  It performs 5 syscalls for each packet that it sends:


  2391 initial thread CALL  sendto(0x3,0x61b050,0x28,0,0x5232c0,0x10)
  2391 initial thread GIO   fd 3 wrote 40 bytes
   0x 4500 2800 7491  4006  0a00 0004 0a00 0001 3a96 
0016 1865 a781 39d8 12aa 5002 0200 52c9 
|E.([EMAIL PROTECTED]:e..9...P...R.|

   0x0026 |..|

  2391 initial thread RET   sendto 40/0x28
  2391 initial thread CALL sigaction(SIGALRM,0x7fffe6b0,0x7fffe690)
  2391 initial thread RET   sigaction 0
  2391 initial thread CALL  setitimer(0,0x7fffe6c0,0x7fffe6a0)
  2391 initial thread RET   setitimer 0
  2391 initial thread CALL  gettimeofday(0x7fffe680,0)
  2391 initial thread RET   gettimeofday 0
  2391 initial thread CALL  gettimeofday(0x7fffe680,0)
  2391 initial thread RET   gettimeofday 0

Here is a further litany of some of the ways in which this software is 
terrible:


* It does not attempt to increase the socket buffer size (as we have 
already discussed), but


* It also doesn't cope with the possibility that the packet may not be 
sent because the send buffer is full.


* With every packet sent in flood mode it sets a timer for 1 second in 
the future even though we have told it not to send packets once a second 
but as fast as possible


* We also set the signal handler with each packet sent, instead of 
setting it once and leaving it.


* We call gettimeofday twice for each packet, once to retrieve the 
second timestamp and once to retrieve the microseconds.  This is only 
for the purpose of computing the RTT.  However, we can only keep track 
of 400 in-flight packets, which means this is also useless in flood mode.


* The suspend handler does not work

* This does not strike me as quality software :)

Fixing all of the above I can send at about 13MB/sec (timecounter is not 
relevant any more).  The CPU is spending about 75% of the time in the 
kernel, so


The "doesn't cope with the possibility that ... the send buffer is full" 
issue is classic linux-specific mis-behaviour.  On linux the process 
will block when the default qdisc finds the device q is stopped (due to 
being full).  I remember cursing iperf for this.


Sam
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Memory allocation performance

2008-02-03 Thread Bruce Evans

On Mon, 4 Feb 2008, Alexander Motin wrote:


Kris Kennaway wrote:
You can look at the raw output from pmcstat, which is a collection of 
instruction pointers that you can feed to e.g. addr2line to find out 
exactly where in those functions the events are occurring.  This will often 
help to track down the precise causes.


Thanks to the hint, it was interesting hunting, but it shown nothing. It hits 
into very simple lines like:

bucket = cache->uc_freebucket;
cache->uc_allocs++;
if (zone->uz_ctor != NULL) {
cache->uc_frees++;
and so on.
There is no loops, there is no inlines or macroses. Nothing! And the only 
hint about it is a huge number of "p4-resource-stall"s in those lines. I have 
no idea what exactly does it means, why does it happens mostly here and how 
to fight it.


Try profiling it one another type of CPU, to get different performance
counters but hopefully not very different stalls.  If the other CPU doesn't
stall at all, put another black mark against P4 and delete your copies of
it :-).

Bruce
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"