Re: mbuf external buffer reference counters

2002-07-12 Thread Jon Mini

On Thu, Jul 11, 2002 at 11:41:04PM -0700, Alfred Perlstein wrote:
> > That's a cool idea.. haven't looked at NetBSD but am imagining the
> > mbufs would be linked in a 'ring'. This works because you never
> > care how many references are, just whether there's one or more than
> > one, and this is easy to tell by examining the ring pointer.
> > I.e., you never have to iterate through the entire ring.
> 
> That's true, but could someone explain how one can safely and
> effeciently manipulate such a structure in an SMP environment?
> 
> I'm not saying it's impossible, I'm just saying it didn't seem
> intuative to me back then, as well as now.

I'm probably speaking out of turn here (I have no idea what structure you
all are talking about), but a monodirectional ring can be safely modified
with a compare-and-exchange atomic operation.

-- 
Jonathan Mini <[EMAIL PROTECTED]>
http://www.freebsd.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Bosko Milekic

On Fri, Jul 12, 2002 at 12:10:41AM -0700, Alfred Perlstein wrote:
> * Julian Elischer <[EMAIL PROTECTED]> [020712 00:00] wrote:
> > 
> > 
> > On Thu, 11 Jul 2002, Alfred Perlstein wrote:
> > > 
> > > That's true, but could someone explain how one can safely and
> > > effeciently manipulate such a structure in an SMP environment?
> > 
> > what does NetBSD do for that?
> 
> They don't!
> 
>  *** waves skull staff exasperatedly ***
> 
> RORWLRLRLLRL

 Again, Alfred is right. :-)

 I can't think of a way to ensure that the owner of the other mbuf
 doesn't manipulate its two forward/backward pointers while we're
 manipulating ours.  The only way that springs to mind is to have them
 protected by a mutex, but:

 1) that would be very expensive and would bloat the mbuf structure a
LOT;

 2) we would probably run into lock order reversal problems.

 I see now what Alfred meant when he made his original comment.

-- 
Bosko Milekic
[EMAIL PROTECTED]
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Jon Mini

On Fri, Jul 12, 2002 at 07:45:07AM -0400, Bosko Milekic wrote:
>
>  [ ... Description of modifying a bidrectional ring ... ]
>
>  So I guess that what we're dealing with isn't really a
>  "monodirectional" ring.  Right?

Yep. =)

-- 
Jonathan Mini <[EMAIL PROTECTED]>
http://www.freebsd.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Giorgos Keramidas

On 2002-07-11 17:12 +, Bosko Milekic wrote:
> On Thu, Jul 11, 2002 at 01:56:08PM -0700, Luigi Rizzo wrote:
> > example: userland does an 8KB write, in the old case this requires
> > 4 clusters, with the new one you end up using 4 clusters and stuff
> > the remaining 16 bytes in a regular mbuf, then depending on the
> > relative producer-consumer speed the next write will try to fill
> > the mbuf and attach a new cluster, and so on... and when TCP hits
> > these data-in-mbuf blocks will have to copy rather than reference
> > the data blocks...
>
> This is a good observation if we're going to be doing benchmarking,
> but I'm not sure whether the repercussions are that important (unless,
> as I said, there's a lot of applications that send exactly 8192
> byte chunks?).

This is not true only for 8192 byte-sized writes.  Anything that uses
a block size >2048 near a power of 2 will have the same problem.
Writes that use 2048 bytes, 4096, 8192, 16384, ... will all have this
very same problem :/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Bosko Milekic


On Fri, Jul 12, 2002 at 04:26:53AM -0700, Jon Mini wrote:
> On Thu, Jul 11, 2002 at 11:41:04PM -0700, Alfred Perlstein wrote:
> > > That's a cool idea.. haven't looked at NetBSD but am imagining the
> > > mbufs would be linked in a 'ring'. This works because you never
> > > care how many references are, just whether there's one or more than
> > > one, and this is easy to tell by examining the ring pointer.
> > > I.e., you never have to iterate through the entire ring.
> > 
> > That's true, but could someone explain how one can safely and
> > effeciently manipulate such a structure in an SMP environment?
> > 
> > I'm not saying it's impossible, I'm just saying it didn't seem
> > intuative to me back then, as well as now.
> 
> I'm probably speaking out of turn here (I have no idea what structure you
> all are talking about), but a monodirectional ring can be safely modified
> with a compare-and-exchange atomic operation.

 The jist of the problem is that when you want to say, remove yourself
 from the list, you have to:

 1) your "next"'s back pointer to your "back" pointer
 2) your "Prev"'s next pointer to your "next" pointer

 So that's two operations but for all you know your "next" or your
 "back" may be doing the same thing to you at the same time.  As far as
 I know, you can't (intuitively) figure out a way to do both of these
 atomically. i.e., maybe you'll set your next's back pointer to whatever
 you have in `back' but then your `back' guy will set your back pointer
 to whatever he has in `back' and then your next guy's back pointer will
 be invalid, for example.

 So I guess that what we're dealing with isn't really a
 "monodirectional" ring.  Right?
 
> -- 
> Jonathan Mini <[EMAIL PROTECTED]>
> http://www.freebsd.org/

Regards,
-- 
Bosko Milekic
[EMAIL PROTECTED]
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Giorgos Keramidas

On 2002-07-12 07:45 +, Bosko Milekic wrote:
> The jist of the problem is that when you want to say, remove yourself
> from the list, you have to:
>
> 1) your "next"'s back pointer to your "back" pointer
> 2) your "Prev"'s next pointer to your "next" pointer
>
> So that's two operations but for all you know your "next" or your
> "back" may be doing the same thing to you at the same time.  As far as
> I know, you can't (intuitively) figure out a way to do both of these
> atomically. i.e., maybe you'll set your next's back pointer to whatever
> you have in `back' but then your `back' guy will set your back pointer
> to whatever he has in `back' and then your next guy's back pointer will
> be invalid, for example.
>
> So I guess that what we're dealing with isn't really a
> "monodirectional" ring.  Right?

No it isn't.  It looks more like the "dining philosophers" problem.
But that problem's solution would require at least one mutex for every
part of the ring :-(


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: xl checksum and dsniff

2002-07-12 Thread Cambria, Mike



> -Original Message-
> From: Jonathan Lemon [mailto:[EMAIL PROTECTED]]
>> >
> >My guess is that doing hw checksum by the nic could be the 
> issue.  This is
> >the only real difference I can see at present.
> >
> >Any ideas?
> 
> Test your theory.  Turn off hardware checksums with 'ifconfig 
> xl0 -txcsum'

When I do 'ifconfig xl0 -txcsum', a subsequent 'ifconfig' reads as if the
command had no effect.  In other words, ifconfig shows
options=3 still.

Using tcpdump, it also still reports 'bad checksum' even though everything
works fine. 

The man page for xl also doesn't show these commands.  Perhaps they are not
turned on yet?

On a similar machine, running OpenBSD 3.0, dsniff works just fine.  This
machine doesn't have support for checksum offload (or at least, ifconfig xl0
doesn't indicate it.)

MikeC

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Question about network layers in FreeBSD 4.x

2002-07-12 Thread freebsd

I have a system I run FreeBSD 4.5-release on.  The purpose of this system is 
to run Snort (IDS).

The current system is a Compaq Proliant 1850R, have also tried on a Compaq 
Proliant 1600R.

Both systems are SMP with dual processors, > 256m ram, and Compaq Smart Array 
controller to handle raid in hardware.

I want to use this box to monitor multiple lan segments.  So I use the 
builtin tlan eth for mgmt, and than add other nics with no IP addresses for 
snort to listen on.

This works great when I use distinct multiple NIC cards.  3com + Intel + 
Realtek.

However, when I try to use a quad ethernet card, it fails.  The programs 
don't bomb, no errors reported.  But there is amount of activity that doesn't 
get picked up when using the quad cards vs. when using the multiple NICs 
scenario.

For example, if someone in lan segment x.x.a.x connects to a *nix server in 
x.x.b.x (both monitored by this box), and a suspicious event occurs I will 
see it captured by both of the snort interfaces.  If, however, I put in the 
quad card, and the same thing happens, it will only be seen/recorded by one 
of the snort nic instances.

I have tried this with a Znyx ZX346Q and with an Adaptec quad card.  With the 
Znyx I tried both the default freebsd drivers it sees that card as and also 
with the Znyx drivers.  This seems to be a problem somewhere other than in 
the NIC driver itself.

Any suggestions or insight into what might be wrong here would be greatly 
appreciated.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: xl checksum and dsniff

2002-07-12 Thread Jonathan Lemon


On Fri, Jul 12, 2002 at 09:06:13AM -0400, Cambria, Mike wrote:
> > -Original Message-
> > From: Jonathan Lemon [mailto:[EMAIL PROTECTED]]
> >> >
> > >My guess is that doing hw checksum by the nic could be the 
> > issue.  This is
> > >the only real difference I can see at present.
> > >
> > >Any ideas?
> > 
> > Test your theory.  Turn off hardware checksums with 'ifconfig 
> > xl0 -txcsum'
> 
> When I do 'ifconfig xl0 -txcsum', a subsequent 'ifconfig' reads as if the
> command had no effect.  In other words, ifconfig shows
> options=3 still.

Oh, hmm.  It appears that this driver doesn't support disabling checksums.
For the time being, you can recompile the driver, and manually disable the
checksums by editing the define at the top of the file:

#define XL905B_CSUM_FEATURES0
-- 
Jonathan

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: xl checksum and dsniff

2002-07-12 Thread Luigi Rizzo

Actually, I seem to remember that the ifconfig output only shows
the driver's capabilities, not the actual setting.

cheers
luigi

On Fri, Jul 12, 2002 at 12:00:48PM -0500, Jonathan Lemon wrote:
> 
> On Fri, Jul 12, 2002 at 09:06:13AM -0400, Cambria, Mike wrote:
> > > -Original Message-
> > > From: Jonathan Lemon [mailto:[EMAIL PROTECTED]]
> > >> >
> > > >My guess is that doing hw checksum by the nic could be the 
> > > issue.  This is
> > > >the only real difference I can see at present.
> > > >
> > > >Any ideas?
> > > 
> > > Test your theory.  Turn off hardware checksums with 'ifconfig 
> > > xl0 -txcsum'
> > 
> > When I do 'ifconfig xl0 -txcsum', a subsequent 'ifconfig' reads as if the
> > command had no effect.  In other words, ifconfig shows
> > options=3 still.
> 
> Oh, hmm.  It appears that this driver doesn't support disabling checksums.
> For the time being, you can recompile the driver, and manually disable the
> checksums by editing the define at the top of the file:
> 
>   #define XL905B_CSUM_FEATURES0
> -- 
> Jonathan
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: xl checksum and dsniff

2002-07-12 Thread Jonathan Lemon

No - ifconfig shows the actual settings.  'ifconfig -m' will show
both the configured settings and the driver capability list.
-- 
Jonathan

On Fri, Jul 12, 2002 at 10:43:24AM -0700, Luigi Rizzo wrote:
> Actually, I seem to remember that the ifconfig output only shows
> the driver's capabilities, not the actual setting.
> 
>   cheers
>   luigi
> 
> On Fri, Jul 12, 2002 at 12:00:48PM -0500, Jonathan Lemon wrote:
> > 
> > On Fri, Jul 12, 2002 at 09:06:13AM -0400, Cambria, Mike wrote:
> > > > -Original Message-
> > > > From: Jonathan Lemon [mailto:[EMAIL PROTECTED]]
> > > >> >
> > > > >My guess is that doing hw checksum by the nic could be the 
> > > > issue.  This is
> > > > >the only real difference I can see at present.
> > > > >
> > > > >Any ideas?
> > > > 
> > > > Test your theory.  Turn off hardware checksums with 'ifconfig 
> > > > xl0 -txcsum'
> > > 
> > > When I do 'ifconfig xl0 -txcsum', a subsequent 'ifconfig' reads as if the
> > > command had no effect.  In other words, ifconfig shows
> > > options=3 still.
> > 
> > Oh, hmm.  It appears that this driver doesn't support disabling checksums.
> > For the time being, you can recompile the driver, and manually disable the
> > checksums by editing the define at the top of the file:
> > 
> > #define XL905B_CSUM_FEATURES0
> > -- 
> > Jonathan
> > 
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Question about network layers in FreeBSD 4.x

2002-07-12 Thread Thierry Herbelot

freebsd wrote:
> 
> I have a system I run FreeBSD 4.5-release on.  The purpose of this system is
> to run Snort (IDS).
> 
> The current system is a Compaq Proliant 1850R, have also tried on a Compaq
> Proliant 1600R.
> 
> Both systems are SMP with dual processors, > 256m ram, and Compaq Smart Array
> controller to handle raid in hardware.
> 

FreeBSD 4.x (did-you notice 4.6 has been released ?) is not very good at
using SMP machines where there are lots of interrupts (the kernel can
only be run by one CPU at any one time, and this is enforced by a "Big
Giant Lock").

you should re-run your test without the SMP option, to see it the
problem is still here (it should not)

then, there are kernel options in recent versions of FreeBSD enabling an
optimized use of the interrupts (DEVICE POLLING). this may help you, if
the driver has been modified.

I used a cheap 4-port NIC from DLINK (DFE-570-TX) with very good success
(this is the dc driver)

Hope this helps

TfH

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread John Polstra

In article <[EMAIL PROTECTED]>,
Bosko Milekic  <[EMAIL PROTECTED]> wrote:
> 
>   Right now, in -CURRENT, there is this hack that I introduced that
>   basically just allocates a ref. counter for external buffers attached
>   to mbufs with malloc(9).  What this means is that if you do something
>   like allocate an mbuf and then a cluster, there's a malloc() call that
>   is made to allocate a small (usually 4-byte) reference counter for it.
> 
>   That sucks,

Eeek, it sure does!

>   and even -STABLE doesn't do this. I changed it this way
>   a long time ago for simplicity's sake and since then I've been meaning
>   to do something better here.  The idea was, for mbuf CLUSTERS, to
>   stash the counter at the end of the 2K buffer area, and to make
>   MCLBYTES = 2048 - sizeof(refcount), which should be more than enough,
>   theoretically, for all cluster users.  This is by far the easiest
>   solution (I had it implemented about 10 months ago) and it worked
>   great.
> 
>   The purpose of this Email is to find out if anyone has concrete
>   information on why this wouldn't work (if they think it wouldn't).

I've been out of town and I realize I'm coming into this thread late
and that it has evolved a bit.  But I still think it's worthwhile to
point out a very big problem with the idea of putting the reference
count at the end of each mbuf cluster.  It would have disastrous
consequences for performance because of cache effects.  Bear with me
through a little bit of arithmetic.

Consider a typical PIII CPU that has a 256 kbyte 4-way set-associative
L2 cache with 32-byte cache lines.  4-way means that there are 4
different cache lines associated with each address.  Each group of 4
is called a set, and each set covers 32 bytes of the address space
(the cache line size).

The total number of sets is:

256 kbytes / 32 bytes per line / 4 lines per set = 2048 sets

and as mentioned above, each set covers 32 bytes.

The cache wraps around every 256 kbytes / 4-way = 64 kbytes of address
space.  In other words, if address N maps onto a given set, then
addresses N + 64k, N + 128k, etc. all map onto the same set.

An mbuf cluster is 2 kbytes and all mbuf clusters are well-aligned.
So the wrap around of the cache occurs every 64 kbytes / 2 kbytes per
cluster = 32 clusters.  To put it another way, all of the reference
counts would be sharing (i.e., competing for) the same 32 cache sets
and they would never utilize the remaining 2061 sets at all.  Only
1.56% of the cache (32 sets / 2048 sets) would be usable for the
reference counts.  This means there would be a lot of cache misses as
reference count updates caused other reference counts to be flushed
from the cache.

These cache effects are huge, and they are growing all the time as CPU
speeds increase while RAM speeds remain relatively constant.

It is much better to have the reference counts laid out as they are
in -stable, i.e., one big contiguous block of counts.  That way, the
counts are spread out through the entire cache and they don't compete
with each other nearly so much.  That is the underlying principle of
slab allocators, by the way.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Julian Elischer



On Fri, 12 Jul 2002, Giorgos Keramidas wrote:

> On 2002-07-12 07:45 +, Bosko Milekic wrote:
> >
> > So I guess that what we're dealing with isn't really a
> > "monodirectional" ring.  Right?
> 
> No it isn't.  It looks more like the "dining philosophers" problem.
> But that problem's solution would require at least one mutex for every
> part of the ring :-(

Te stuff under consideration originally came from OSF/1 which became
true-64

that was heavily SMP
can anyone find out what they did?


> 
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Bosko Milekic


On Fri, Jul 12, 2002 at 11:03:45AM -0700, John Polstra wrote:
> I've been out of town and I realize I'm coming into this thread late
> and that it has evolved a bit.  But I still think it's worthwhile to
> point out a very big problem with the idea of putting the reference
> count at the end of each mbuf cluster.  It would have disastrous
> consequences for performance because of cache effects.  Bear with me
> through a little bit of arithmetic.
> 
> Consider a typical PIII CPU that has a 256 kbyte 4-way set-associative
> L2 cache with 32-byte cache lines.  4-way means that there are 4
> different cache lines associated with each address.  Each group of 4
> is called a set, and each set covers 32 bytes of the address space
> (the cache line size).
> 
> The total number of sets is:
> 
> 256 kbytes / 32 bytes per line / 4 lines per set = 2048 sets
> 
> and as mentioned above, each set covers 32 bytes.
> 
> The cache wraps around every 256 kbytes / 4-way = 64 kbytes of address
> space.  In other words, if address N maps onto a given set, then
> addresses N + 64k, N + 128k, etc. all map onto the same set.
> 
> An mbuf cluster is 2 kbytes and all mbuf clusters are well-aligned.
> So the wrap around of the cache occurs every 64 kbytes / 2 kbytes per
> cluster = 32 clusters.  To put it another way, all of the reference
> counts would be sharing (i.e., competing for) the same 32 cache sets
> and they would never utilize the remaining 2061 sets at all.  Only
> 1.56% of the cache (32 sets / 2048 sets) would be usable for the
> reference counts.  This means there would be a lot of cache misses as
> reference count updates caused other reference counts to be flushed
> from the cache.
> 
> These cache effects are huge, and they are growing all the time as CPU
> speeds increase while RAM speeds remain relatively constant.

  I've thought about the cache issue with regards to the ref. counts
  before, actually, and initially, I also thought the exact same thing
  as you bring up here.  However, there are a few things you need to
  remember:

 1) SMP; counters are typically referenced by several different threads
 which may be running on different CPUs at any given point in time, and
 this means that we'll probably end up having corresponding cache lines
 invalidated back and forth anyway;

 2) Using more cache lines may not be better overall, we may be doing
 write-backs of other data already there; in any case, we would really
 have to measure this;

 3) By far the most important: all modifications to the ref. count are
 atomic, bus-locked, ops.  I spoke to Peter a little about this and
 although I'm not 100% sure, we think that bus-locked
 fetch-inc/dec-stores need the bus anyway.  If that's the case,
 then we really don't care about whether or not they get cached, right?

> John
> -- 
>   John Polstra
>   John D. Polstra & Co., Inc.Seattle, Washington USA
>   "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa

 Thanks for the cool infos. and feedback.

Regards,
-- 
Bosko Milekic
[EMAIL PROTECTED]
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: xl checksum and dsniff

2002-07-12 Thread Cambria, Mike


>   #define XL905B_CSUM_FEATURES0

This worked.  dsniff is behaving just fine now.

Next I'll try to track down if this is this a libnet problem, libnids
problem or dsniff problem, so I know which project I need to inform.

Thanks,
MikeC

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: xl checksum and dsniff

2002-07-12 Thread Andrew R. Reiter

On Fri, 12 Jul 2002, Cambria, Mike wrote:

:
:>  #define XL905B_CSUM_FEATURES0
:
:This worked.  dsniff is behaving just fine now.
:
:Next I'll try to track down if this is this a libnet problem, libnids
:problem or dsniff problem, so I know which project I need to inform.

IIRC, the problem is BPF b/c it doesn't know the checksum since the
calculation was offloaded, no?

--
Andrew R. Reiter
[EMAIL PROTECTED]
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: xl checksum and dsniff

2002-07-12 Thread Cambria, Mike

> -Original Message-
> From: Andrew R. Reiter [mailto:[EMAIL PROTECTED]]
> :Next I'll try to track down if this is this a libnet problem, libnids
> :problem or dsniff problem, so I know which project I need to inform.
> 
> IIRC, the problem is BPF b/c it doesn't know the checksum since the
> calculation was offloaded, no?

Possibly, or perhaps libpcap?

Now that I know checksum offload is indeed involved, I booted the original
kernel and poked around.

Using dsniff -c, dsniff was able to see packets received just fine.  The
half of the session sent is what dsniff can't track.  Packets received,
although tcpdump shows "bad checksum", are seen by dsniff just fine.  I
expected it to be the other way around.

MikeC

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread John Polstra

In article <[EMAIL PROTECTED]>,
Bosko Milekic  <[EMAIL PROTECTED]> wrote:
> 
>   I've thought about the cache issue with regards to the ref. counts
>   before, actually, and initially, I also thought the exact same thing
>   as you bring up here.  However, there are a few things you need to
>   remember:
> 
>  1) SMP; counters are typically referenced by several different threads
>  which may be running on different CPUs at any given point in time, and
>  this means that we'll probably end up having corresponding cache lines
>  invalidated back and forth anyway;

Agreed.  The PII and newer CPUs do have some short cuts built in that
mitigate this somewhat by doing direct cache-to-cache updates in the
SMP case.  But quantitatively I don't know how much that helps.

>  2) Using more cache lines may not be better overall, we may be doing
>  write-backs of other data already there; in any case, we would really
>  have to measure this;

The research that led to the slab allocator demonstrated pretty
conclusively that, at least in general, it's better to spread out the
usage across all cache lines rather than compete for just a few.

Measurements trump research, though, as long as the measurements
reflect real-world usage patterns.

If you decide to pack the refcounts into the clusters themselves, it
might be better to put the recount at the front of each cluster, and
offset the packet data by 16 bytes to make room for it.  That way,
the reference count would be in the same cache line as the first part
of the packet header -- a cache line which is almost certain to be
accessed (though probably not dirtied) anyway.

>  3) By far the most important: all modifications to the ref. count are
>  atomic, bus-locked, ops.  I spoke to Peter a little about this and
>  although I'm not 100% sure, we think that bus-locked
>  fetch-inc/dec-stores need the bus anyway.  If that's the case,
>  then we really don't care about whether or not they get cached, right?

I'm afraid I don't know the answer to that.

The majority of systems will be uniprocessor for a good long time, and
I would hate to see their performance sacrificed needlessly.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Andrew Gallatin


Julian Elischer writes:
 > 
 > 
 > On Fri, 12 Jul 2002, Giorgos Keramidas wrote:
 > 
 > > On 2002-07-12 07:45 +, Bosko Milekic wrote:
 > > >
 > > > So I guess that what we're dealing with isn't really a
 > > > "monodirectional" ring.  Right?
 > > 
 > > No it isn't.  It looks more like the "dining philosophers" problem.
 > > But that problem's solution would require at least one mutex for every
 > > part of the ring :-(
 > 
 > Te stuff under consideration originally came from OSF/1 which became
 > true-64
 > 
 > that was heavily SMP
 > can anyone find out what they did?

>From looking at a Tru64 5.1 header file, it looks like they do per-ext
locking and declare an MBUF_EXT_LOCK(m) macro.  It is not clear how
one is supposed to use this & it appears to be undocumented.  Tru64
also has a global mbuf lock.  Tru64 4.x does not appear to have the
MBUF_EXT_LOCK (so I think it uses just the global MBUF_LOCK for all
mbuf manipulations; and I'll bet that just does a 'splimp' on UP
systems).

AIX also has this nice ext_refq structure and it also appears to be doing
per-ext locking.  From mbuf.h, AIX's ext mbufs are all just malloc'ed
memory.  This jives with the pain & suffering I had when writing an
ethernet driver for AIX & finding mbuf's which cross page boundaries.

MacOS-X seems to have both a refq and a refcnt array like in -stable.
It appears to use the refq for externally managed data and the refcnt
for system clusters. As for locking, it looks a lot like Tru64 4.x --
it has a global mbuf lock.  Perhaps this is what the original Mach
did?

WRT to using refqs -- I think that Bosko's system in -current is just
as nice from a user's perspective, and if we can work out an
acceptable solution for doing refcnts, lets not revert to refqs.

I agree with John about where to put the refcnts: I think we should
have a big hunk of memory for the refcnts like in -stable.  My
understanding is that the larger virtually contig mbufs are the only
thing that would cause a problem for this, or is that incorrect?
If so, then why not just put their counter elsewhere?

One concrete example against putting the refcnts into the cluster is
that it would cause NFS servers & clients to use 25% more mbufs for a
typical 8K read or write request.

Drew





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Bosko Milekic


On Fri, Jul 12, 2002 at 06:55:37PM -0400, Andrew Gallatin wrote:
[...]

FWIW, BSD/OS also does similar to -STABLE.

[...]
> I agree with John about where to put the refcnts: I think we should
> have a big hunk of memory for the refcnts like in -stable.  My
> understanding is that the larger virtually contig mbufs are the only
> thing that would cause a problem for this, or is that incorrect?
> If so, then why not just put their counter elsewhere?
> 
> One concrete example against putting the refcnts into the cluster is
> that it would cause NFS servers & clients to use 25% more mbufs for a
> typical 8K read or write request.

  If we decide to allocate jumbo bufs from their own seperate map as
  well then we have no wastage for the counters for clusters if we keep
  them in a few pages, like in -STABLE, and it should all work out fine.

  For the jumbo bufs I still maintain that we should keep the counter
  for them at the end of the buf because the math works out (see my post
  in that thread with the math example) and because their total size is
  not a power of 2 anyway.  They'll also be more randomly spread out and
  use more cache slots.

> Drew

Regards,
-- 
Bosko Milekic
[EMAIL PROTECTED]
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: mbuf external buffer reference counters

2002-07-12 Thread Andrew Gallatin


Bosko Milekic writes:
<...>
 >   If we decide to allocate jumbo bufs from their own seperate map as
 >   well then we have no wastage for the counters for clusters if we keep
 >   them in a few pages, like in -STABLE, and it should all work out fine.

That sounds good.

 >   For the jumbo bufs I still maintain that we should keep the counter
 >   for them at the end of the buf because the math works out (see my post
 >   in that thread with the math example) and because their total size is
 >   not a power of 2 anyway.  They'll also be more randomly spread out and
 >   use more cache slots.

How about, as (I think it was) John suggested, putting the counters at
the front of the buffer so they'd be close to the headers, etc in the
cache and would be less likely to cause their own unique cache miss
when you access them?

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: mbuf external buffer reference counters

2002-07-12 Thread Jim McGrath

> Julian Elischer writes:
>  >
>  >
>  > Te stuff under consideration originally came from OSF/1 which became
>  > true-64
>  >
>  > that was heavily SMP
>  > can anyone find out what they did?
>
> From looking at a Tru64 5.1 header file, it looks like they do per-ext
> locking and declare an MBUF_EXT_LOCK(m) macro.  It is not clear how
> one is supposed to use this & it appears to be undocumented.  Tru64
> also has a global mbuf lock.  Tru64 4.x does not appear to have the
> MBUF_EXT_LOCK (so I think it uses just the global MBUF_LOCK for all
> mbuf manipulations; and I'll bet that just does a 'splimp' on UP
> systems).
>
When I was at Hitachi in Watltham, MA. we did a port of OSF/1 to Hitachi's
SR8000 Super.  http://www.hitachi-eu.com/hel/hpcc/ It is based on a 64 bit
implementation of the Power PC.  My only involvement was with the Large File
System. and I really don't remember how cluster bufs ref count was
implemented.  Most of the people who may have been involved are at Egenera,
with the code somewhere at Hitachi in Japan.  If you have any contact at
either place, you might check with them.

Jim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Masquerade fails to suppress X-sender

2002-07-12 Thread Crist J. Clark

On Thu, Jul 11, 2002 at 01:30:53AM +0200, Julian Stacey wrote:
> Hi [EMAIL PROTECTED]
> Since I gave my FreeBSD-4.5-Release gateway a new sendmail.cf today,
> I've been getting both these in my headers:
>   Received: from jhs.muc.de (520006753247-0001@[217.235.121.155])
>by fmrl11.sul.t-online.com with esmtp id
>17SPs5-0MzVXEC; Thu, 11 Jul 2002 00:23:41 +0200
>   X-sender: [EMAIL PROTECTED]
> I never used to have 520006753247 appear, (I've confirmed that by
> inspecting my morning's post to a simple expoder list (that leaves
> headers unchanged), which came back clean without any 520006753247)
>   ( Reason I don't want people to see 520006753247-0001 is that's my
> account, & while not private as such, no need ot publicise,
> & dont want people emailing me (or spamming!) there either ).
> 
> So I'd like to kill off that number from appearing, any idea how to do it ?

The '-f' option of sendmail(8) would do this. See also the "trusted
user" options for your sendmail.mc. I am not aware of away to set up a
fake user in the sendmail.{mc,cf} files, but that does not mean there
isn't one.
-- 
Crist J. Clark | [EMAIL PROTECTED]
   | [EMAIL PROTECTED]
http://people.freebsd.org/~cjc/| [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message