patch to cleanup inflight desciptor handling.

2000-12-13 Thread Alfred Perlstein

Not a lot of people are familiar with fd passing so I'll give
a short description:

  By using AF_UNIX sockets between processes, a process can use
  sendmsg() to send a filedescriptor through the socket where the
  other process will do a recvmsg() to pickup the descriptor.

The "problem" is that if a descriptor is in transit/inflight
and the sending process closes the file, it still needs to
remain open for the recipient.

What can happen is:

process A: sendmsg(descriptor)
process A: exit
process B: exit

without the garbage collection we'd have leaked a file descriptor
inside the kernel.

There's a pretty complex loop in sys/kern/uipc_usrreq.c that
deals with garbage collecting these inflight descriptors.

The problem with the garbage collection routine is that:

1) it's expensive as it walks all the open files in the system at least
   twice.
2) it's ugly/hackish
3) it will need to aquire global locks on kernel structure lists
   for signifigant amounts of time.
4) complicates the code because certain things need to be done
   out of order, ie sorflush before sofree (which does the sorflush
   anyway).

The solution is actually taken from Linux, in Linux all network
buffers have the ability to have a free routine callback done
on them when a network buffer is deallocated.

FreeBSD only has a free routine available for M_EXT buffers
(buffers with external storage), the routine is called when
  (m_flags & M_EXT) != 0 && m_type != EXT_CLUSTER

To achieve my goal I made it so that all fd passing requires an
mbuf cluster and took responsibility for freeing the mbuf
cluster in my callback.

I set m_type == EXT_CMSG_DATA and provide my own free routine
until the descriptors are read by the recieving process, if the
descriptors are read then i restore it back to a "normal"
mbuf with an attached cluster to be free()'d.

Good things about this patch:
1) simplifies
   a) locking
   b) descriptor management
   c) the code in general
2) less latency, the gc routine can be expensive
3) some comments are added describing some other stuff that needs
   fixing. (problems with rfork threads)
4) shrink struct file by one int

Problems with this patch:
1) most fd passing probably only sends one descriptor at time,
by allocating clusters I'm wasting a lot more space, and taking
more time to do the allocation.
2) the mbuf subsystem should provide macros to do what I'm doing
(hijacking the free routine on a mbuf+cluster)
3) the mbuf subsystem should provide a way to get a callback on
a single mbuf without a cluster attached.

http://people.FreeBSD.org/~alfred/inflight.diff

thanks,
-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: MEXT_IS_REF broken.

2000-12-13 Thread Garrett Wollman

< said:

> Gee, this looks suspiciously like jhb's refcount patch:

...Except that I made provision for architectures which have LL/SC
rather than CAS, which saves a few instructions.

-GAWollman



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Confusing netgraph error

2000-12-13 Thread Julian Elischer

Mark Wright wrote:
> 
> I'm trying to get frame relay working with a lmc1200 card, and I'm getting
> the following error:
> 
> sj# ngctl mkpeer lmc0: frame_relay rawdata downstream
> sj# ngctl mkpeer lmc0:rawdata lmi dlci0 auto0
> sj# ngctl mkpeer lmc0:rawdata rfc1490 dlci15 downstream
> ngctl: send msg: No such file or directory
> 
> What does that mean? Should I be concerned that ifconfig -a doesn't reveal
> the lmc0?  Or does it show up only after I do a 'ngctl interface'?  It shows
> up in dmesg:
> 
> lmc0:  port 0xf880-0xf8ff mem 0xffbefc00-0xffbefc7f
> irq 11 at device 16.0 on pci0
> lmc0: pass 2.2, serial 00:60:99:00:23:6d
> lmc0: driver is using old-style compatability shims
> 
> I'm using the 20001210 snapshot, and the following changes were made to my
> kernel config:
> 
> options NETGRAPH#enable netgraph networking
> options NETGRAPH_SOCKET #enable netgraph networking
> options NETGRAPH_FRAME_RELAY#enable frame relay
> options NETGRAPH_LMI#enable link management
> ...
> device  lmc # LanMedia card

duh, I didn't notice but archie caught it..
you apparently haven;t compiled in the rfc1490 node..
no probelm, it's a module so just type:

kldload ng_rfc1490

and try again.

> 
> Mark
> 
> [EMAIL PROTECTED]
> 
> _
> Get more from the Web.  FREE MSN Explorer download : http://explorer.msn.com
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

-- 
  __--_|\  Julian Elischer
 /   \ [EMAIL PROTECTED]
(   OZ) World tour 2000
---> X_.---._/  presently in:  Budapest
v



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Bosko Milekic


  Hi,

A while ago (it's been at least two weeks now), Mike Silbersack
  requested a review for:

  http://www.silby.com/patches/ratelimit-enhancement-2.patch

  To quote the description on his web page, this diff will:

  * ICMP ECHO and TSTAMP replies are now rate-limited.
  * RSTs generated due to packets sent to open and unopen ports
are now seperated into separate queues.
  * Each rate limiting queue now has its own description, as 
follows:
   Suppressing udp flood/scan: 212/200 pps
   Suppressing outgoing RST due to port scan: 202/200 pps
   Suppressing outgoing RST due to ACK flood: 19725/200 pps
   Suppressing ping flood: 230/200 pps
   Suppressing icmp tstamp flood: 210/200 pps

  While the descriptions for the two RST cases can be accused
  of oversimplification, they should cut down on questions by
  users confused with the current terminology.  Experienced
  users can always run a packet sniffer if they need more
  exact knowledge of what's occuring.

The diff was initially reviewed by me and green, and the recommended
  changes were mainly stylistic. I want to commit this code, but I'm
  posting it up here in case someone has any final objections or review.

  Thanks,
  Bosko Milekic
  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Richard A. Steenbergen

On Wed, 13 Dec 2000, Bosko Milekic wrote:

>Suppressing udp flood/scan: 212/200 pps
>Suppressing outgoing RST due to port scan: 202/200 pps
>Suppressing outgoing RST due to ACK flood: 19725/200 pps
>Suppressing ping flood: 230/200 pps
>Suppressing icmp tstamp flood: 210/200 pps
> 
>   While the descriptions for the two RST cases can be accused
>   of oversimplification, they should cut down on questions by
>   users confused with the current terminology.  Experienced
>   users can always run a packet sniffer if they need more
>   exact knowledge of what's occuring.

I would be extremely careful with those descriptions... When you tell
people directly that something is an attack, even if its not, there are
enough who will jump to immediate conclusions and begin making false
accusations. While it may be highly likely that the reasons for those rate
limits is some kind of attack, it is not guaranteed, and I would be very
reluctant to so blatantly tell people that it is...

Personally I'd recommend straight forward descriptions like "RST due to no
listening socket". I also see no compelling reason to put ICMP Timestamp
in a seperate queue, but what I would recommend is seperate queues for
ICMP messages which would be defined as "query/response" and those which
would be called "error" messages. If someone needs more specific
protection they can use dummynet.

Just a thought...

-- 
Richard A Steenbergen <[EMAIL PROTECTED]>   http://www.e-gerbil.net/humble
PGP Key ID: 0x138EA177  (67 29 D7 BC E8 18 3E DA  B2 46 B3 D8 14 36 FE B6)




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Alfred Perlstein

* Richard A. Steenbergen <[EMAIL PROTECTED]> [001213 11:17] wrote:
> On Wed, 13 Dec 2000, Bosko Milekic wrote:
> 
> >Suppressing udp flood/scan: 212/200 pps
> >Suppressing outgoing RST due to port scan: 202/200 pps
> >Suppressing outgoing RST due to ACK flood: 19725/200 pps
> >Suppressing ping flood: 230/200 pps
> >Suppressing icmp tstamp flood: 210/200 pps
> > 
> >   While the descriptions for the two RST cases can be accused
> >   of oversimplification, they should cut down on questions by
> >   users confused with the current terminology.  Experienced
> >   users can always run a packet sniffer if they need more
> >   exact knowledge of what's occuring.
> 
> I would be extremely careful with those descriptions... When you tell
> people directly that something is an attack, even if its not, there are
> enough who will jump to immediate conclusions and begin making false
> accusations. While it may be highly likely that the reasons for those rate
> limits is some kind of attack, it is not guaranteed, and I would be very
> reluctant to so blatantly tell people that it is...
> 
> Personally I'd recommend straight forward descriptions like "RST due to no
> listening socket". I also see no compelling reason to put ICMP Timestamp
> in a seperate queue, but what I would recommend is seperate queues for
> ICMP messages which would be defined as "query/response" and those which
> would be called "error" messages. If someone needs more specific
> protection they can use dummynet.
> 
> Just a thought...

I think the word "possible" should be prepended to all of these messages.

Now I have a weird question, I've seen the ICMP responce limit when
getting pegged by a couple hundred hits per second on a port that isn't
open by legimitimate connections.

This would probably fall under:
  > >Suppressing outgoing RST due to port scan: 202/200 pps

Which is untrue, it should read something like:
Suppressing outgoing RST due to high rate of connections on an unopen port (possible 
portscan): 202/200 pps

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Mike Silbersack


On Wed, 13 Dec 2000, Richard A. Steenbergen wrote:

> I would be extremely careful with those descriptions... When you tell
> people directly that something is an attack, even if its not, there are
> enough who will jump to immediate conclusions and begin making false
> accusations. While it may be highly likely that the reasons for those rate
> limits is some kind of attack, it is not guaranteed, and I would be very
> reluctant to so blatantly tell people that it is...
> 
> Personally I'd recommend straight forward descriptions like "RST due to no
> listening socket".

Well, as no IPs are listed, I'm not too concerned about libelous attack
accusations resulting from the messages.  However, I'm not opposed to
changing the messages, as long as the distinction between the cases is
clear.  Do you have exact replacements for each case along the line of
what you're thinking of?  (Making it fit into 80 characters is the tough
part.)

> I also see no compelling reason to put ICMP Timestamp
> in a seperate queue, but what I would recommend is seperate queues for
> ICMP messages which would be defined as "query/response" and those which
> would be called "error" messages. If someone needs more specific
> protection they can use dummynet.

Well, I should make a clarification here.  My use of the word queue is
wrong.  All the rate limiting does is count packets per second and drop
those above the allowed amount.  Hence, there's no significant overhead
to having counters for each seperate type.

The main reason tstamp is distinct from echo is so that they can be
reported correctly.  Given that they are distinctly different packets, I
think this makes sense.  (And has less overhead than dummynet would.)

Mike "Silby" Silbersack



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Richard A. Steenbergen

On Wed, 13 Dec 2000, Alfred Perlstein wrote:

> I think the word "possible" should be prepended to all of these messages.
> 
> Now I have a weird question, I've seen the ICMP responce limit when
> getting pegged by a couple hundred hits per second on a port that isn't
> open by legimitimate connections.
> 
> This would probably fall under:
>   > >Suppressing outgoing RST due to port scan: 202/200 pps
> 
> Which is untrue, it should read something like:
> Suppressing outgoing RST due to high rate of connections on an unopen
> port (possible portscan): 202/200 pps

It could just as easily be a SYN flood against a single port... or a large
number of clients trying to connected to your crashed web server... :P Or
it could just as easily be an ack flood against a port without a listener
and be showing up in the "not the ack flood" counter.

Attaching motives and trying to play intrusion detection pattern analysis
games without complete information is dangerous, and none of these
routines qualify as advanced enough to make any such determination. IMHO
break it down by "RST from ports with or without a listener" (or open
port, whatever floats the boat) and be done with it. The major goal of
this code would seem to be to provide simple but fairly useful protection
against common attacks out of the box, not to provide analysis of the
attacks (since no useful analysis can be performed without looking further
anyways).

-- 
Richard A Steenbergen <[EMAIL PROTECTED]>   http://www.e-gerbil.net/humble
PGP Key ID: 0x138EA177  (67 29 D7 BC E8 18 3E DA  B2 46 B3 D8 14 36 FE B6)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Richard A. Steenbergen

On Wed, 13 Dec 2000, Mike Silbersack wrote:

> > I also see no compelling reason to put ICMP Timestamp
> > in a seperate queue, but what I would recommend is seperate queues for
> > ICMP messages which would be defined as "query/response" and those which
> > would be called "error" messages. If someone needs more specific
> > protection they can use dummynet.
> 
> Well, I should make a clarification here.  My use of the word queue is
> wrong.  All the rate limiting does is count packets per second and drop
> those above the allowed amount.  Hence, there's no significant overhead
> to having counters for each seperate type.
> 
> The main reason tstamp is distinct from echo is so that they can be
> reported correctly.  Given that they are distinctly different packets, I
> think this makes sense.  (And has less overhead than dummynet would.)

Is there some specific reason you need timestamp seperate? If you're
really up for that, why not just limit each ICMP type seperately?

As for performance, this limiting occurs deeper in the stack then ipfw,
and w/dummynet you have the flexability to mask the ips so that each
interface or aliased ip could have a seperate rate limit as well.

My thinking on the matter is that these limits should provide basic
protection out of the box, and site specific limits should be done with
dummynet. I personally agree with this patch because I think there should
be seperate limits at some fundimental level, such as tcp-closed tcp-open
udp(closed) icmp-response and icmp-error. How much further you want to
push it is debatable mainly just because of the hastle of too many
unnecessary tunables, not for any real performance or memory reasons.

-- 
Richard A Steenbergen <[EMAIL PROTECTED]>   http://www.e-gerbil.net/humble
PGP Key ID: 0x138EA177  (67 29 D7 BC E8 18 3E DA  B2 46 B3 D8 14 36 FE B6)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Mike Silbersack


On Wed, 13 Dec 2000, Richard A. Steenbergen wrote:

> Is there some specific reason you need timestamp seperate? If you're
> really up for that, why not just limit each ICMP type seperately?

There's no real need for it to be separate, it just feels cleaner.  I
prefer seeing the two cases have separately reported values.  (Have I
missed any icmp types that a host could respond with?  If so, please tell
me so that I can add them.)

> As for performance, this limiting occurs deeper in the stack then ipfw,
> and w/dummynet you have the flexability to mask the ips so that each
> interface or aliased ip could have a seperate rate limit as well.

Hm, true.  I was thinking of limiting the outgoing side, which would mean
ipfw comes later in the string, but I suppose that if you limit on the
incoming ipfw's sooner.

> My thinking on the matter is that these limits should provide basic
> protection out of the box, and site specific limits should be done with
> dummynet. I personally agree with this patch because I think there should
> be seperate limits at some fundimental level, such as tcp-closed tcp-open
> udp(closed) icmp-response and icmp-error. How much further you want to
> push it is debatable mainly just because of the hastle of too many
> unnecessary tunables, not for any real performance or memory reasons.

I wasn't planning to subdivide the reporting any more in future patches,
so you shouldn't see any new tunables popping up for that reason.

Mike "Silby" Silbersack



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Richard A. Steenbergen

On Wed, 13 Dec 2000, Mike Silbersack wrote:

> On Wed, 13 Dec 2000, Richard A. Steenbergen wrote:
> 
> > Is there some specific reason you need timestamp seperate? If you're
> > really up for that, why not just limit each ICMP type seperately?
> 
> There's no real need for it to be separate, it just feels cleaner.  I
> prefer seeing the two cases have separately reported values.  (Have I
> missed any icmp types that a host could respond with?  If so, please tell
> me so that I can add them.)

Assuming the box is not acting as a router in any fashion... It doesn't
matter, it really doesn't, I'm just note sure timestamp is really worth
the hastle instead of just calling it icmp request... The advantage of
seperate limits is to keep one service working when another is being
limited. Since its a dirt simple operation to pick which limit you're
hitting, and there are no queues involved just counters, it might be just
as easy to go into the rate limiting function as icmp limit, and have it
maintain seperate limits for every type, if you really wanted...

> > As for performance, this limiting occurs deeper in the stack then ipfw,
> > and w/dummynet you have the flexability to mask the ips so that each
> > interface or aliased ip could have a seperate rate limit as well.
> 
> Hm, true.  I was thinking of limiting the outgoing side, which would mean
> ipfw comes later in the string, but I suppose that if you limit on the
> incoming ipfw's sooner.

Historically bandlim has been the process of stopping the processing at
input of things which would result in output... Do you want to (or need
to) extend this?

> I wasn't planning to subdivide the reporting any more in future patches,
> so you shouldn't see any new tunables popping up for that reason.

Same question as above, is this to be built in Denail of Service
prevention, or is this limiting of packets which could potentially
generate excessive processing or replies? Might as well do it right
instead of kludging this up any more... :P

-- 
Richard A Steenbergen <[EMAIL PROTECTED]>   http://www.e-gerbil.net/humble
PGP Key ID: 0x138EA177  (67 29 D7 BC E8 18 3E DA  B2 46 B3 D8 14 36 FE B6)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Mike Silbersack



On Wed, 13 Dec 2000, Richard A. Steenbergen wrote:

> > Hm, true.  I was thinking of limiting the outgoing side, which would mean
> > ipfw comes later in the string, but I suppose that if you limit on the
> > incoming ipfw's sooner.
> 
> Historically bandlim has been the process of stopping the processing at
> input of things which would result in output... Do you want to (or need
> to) extend this?

Since this code actually has to read the incoming packets before decidied
to not send the outgoing reply, I consider it to be dropping the
outgoing.  However, since there's no useful info in a icmp request,
reading isn't really anything... We appear to be caught in a semantical
argument, I'm not proposing anything new.

> Same question as above, is this to be built in Denail of Service
> prevention, or is this limiting of packets which could potentially
> generate excessive processing or replies? Might as well do it right
> instead of kludging this up any more... :P

It just limits the bandwidth of mostly useless packets.  What constitutes
a "DoS" is beyond the scope of this message, and we're starting to
nitpick.  I'll roll an updated patch with less casual messages so we can
get it committed soon.

Mike "Silby" Silbersack



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Richard A. Steenbergen

On Wed, 13 Dec 2000, Mike Silbersack wrote:

> On Wed, 13 Dec 2000, Richard A. Steenbergen wrote:
> 
> > > Hm, true.  I was thinking of limiting the outgoing side, which would mean
> > > ipfw comes later in the string, but I suppose that if you limit on the
> > > incoming ipfw's sooner.
> > 
> > Historically bandlim has been the process of stopping the processing at
> > input of things which would result in output... Do you want to (or need
> > to) extend this?
> 
> Since this code actually has to read the incoming packets before decidied
> to not send the outgoing reply, I consider it to be dropping the
> outgoing.  However, since there's no useful info in a icmp request,
> reading isn't really anything... We appear to be caught in a semantical
> argument, I'm not proposing anything new.

There is a difference though, if you're stopping excessive requests at
input, or stopping the replies from hitting the network after bothing to
process them, which is what I ment about using ipfw at input.

Right now it has to be classified as dropping on incoming, with the
qualifier that we're dropping at incoming things which would result in
further processing and an outgoing reply. Entirely semantics but still.

> > Same question as above, is this to be built in Denail of Service
> > prevention, or is this limiting of packets which could potentially
> > generate excessive processing or replies? Might as well do it right
> > instead of kludging this up any more... :P
> 
> It just limits the bandwidth of mostly useless packets.  What constitutes
> a "DoS" is beyond the scope of this message, and we're starting to
> nitpick.  I'll roll an updated patch with less casual messages so we can
> get it committed soon.

Well my point is, if you're compiling BANDLIM into your kernel and end up
getting messages on possible port scans, this is backwards. If you really
want to make a DoS analysis and prevention system, I think it should be
seperate from this.

-- 
Richard A Steenbergen <[EMAIL PROTECTED]>   http://www.e-gerbil.net/humble
PGP Key ID: 0x138EA177  (67 29 D7 BC E8 18 3E DA  B2 46 B3 D8 14 36 FE B6)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Kirk McKusick

I believe that your changes have been sorely needed for many
years. While I would like to see regular mbufs given a callback
mechanism, your present approach of using an mbuf cluster
solves 90% of the problem.

Kirk  McKusick


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Matt Dillon

:I believe that your changes have been sorely needed for many
:years. While I would like to see regular mbufs given a callback
:mechanism, your present approach of using an mbuf cluster
:solves 90% of the problem.
:
:   Kirk  McKusick

... Aflred, be careful that you don't break things we only just fixed
last year.  The descriptor passing code has been broken for many years.

I think the reason we have to scan the descriptor list is related to
locating isolated self-referential 'loops' with descriptor passing and
unix domain sockets and closing them.  e.g. when you pass a descriptor
for a unix-domain socket through a unix-domain socket, it is possible
for the socket descriptors to reference each other and thus never have
their ref count drop to 0 even when all associated processes have
close()'d.  This happens all the time.  Be sure you don't break the
fix that solves that particular problem.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Brian Somers

> Not a lot of people are familiar with fd passing so I'll give
> a short description:
> 
>   By using AF_UNIX sockets between processes, a process can use
>   sendmsg() to send a filedescriptor through the socket where the
>   other process will do a recvmsg() to pickup the descriptor.
> 
> The "problem" is that if a descriptor is in transit/inflight
> and the sending process closes the file, it still needs to
> remain open for the recipient.
> 
> What can happen is:
> 
> process A: sendmsg(descriptor)
> process A: exit
> process B: exit
> 
> without the garbage collection we'd have leaked a file descriptor
> inside the kernel.
[.]

Hmm, the last time i looked at this, I believe the whole thing was 
dealt with by not increasing the file descriptor reference count 
when it was put in the message header.  If process A closed the 
descriptor before process B actually recvmsg()d it, it would be 
EBADF.  The recvmsg() actually incremented the reference count.

I always assumed that this was a result of not wanting to have any 
funny garbage collecting code ?

Of course looking at the code now, it increases fp->f_count in 
unp_internalize(), so maybe I was on drugs then

Assuming I wasn't on drugs (maybe the behaviour was changed - cvs 
annotate suggests some activity around March 9, but that was the 
file*/int alignment stuff), I think it would be valid to go back 
to the old behaviour (not increasing f_count and letting the original 
owner actually close the descriptor while f_msgcount != 0).

Does any of this make sense ?  Or am I just describing a different 
case (where process B doesn't exit) ?

> -- 
> -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
> "I have the heart of a child; I keep it in a jar on my desk."

-- 
Brian <[EMAIL PROTECTED]>
     
Don't _EVER_ lose your sense of humour !




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Alfred Perlstein

* Matt Dillon <[EMAIL PROTECTED]> [001213 13:07] wrote:
> :I believe that your changes have been sorely needed for many
> :years. While I would like to see regular mbufs given a callback
> :mechanism, your present approach of using an mbuf cluster
> :solves 90% of the problem.
> :
> : Kirk  McKusick
> 
> ... Aflred, be careful that you don't break things we only just fixed
> last year.  The descriptor passing code has been broken for many years.
> 
> I think the reason we have to scan the descriptor list is related to
> locating isolated self-referential 'loops' with descriptor passing and
> unix domain sockets and closing them.  e.g. when you pass a descriptor
> for a unix-domain socket through a unix-domain socket, it is possible
> for the socket descriptors to reference each other and thus never have
> their ref count drop to 0 even when all associated processes have
> close()'d.  This happens all the time.  Be sure you don't break the
> fix that solves that particular problem.

Ok, I'll see if that can happen.  Basically since the reference
never goes to zero on the socket, the buffers are never forced to
be flushed/cleared and the mbuf will then never be free'd resulting
it it leaking itself.  Basically a socket hanging there with an
mbuf referencing itself.

I wonder if Linux fixed/has this problem.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Alfred Perlstein

* Alfred Perlstein <[EMAIL PROTECTED]> [001213 14:20] wrote:
> * Matt Dillon <[EMAIL PROTECTED]> [001213 13:07] wrote:
> > :I believe that your changes have been sorely needed for many
> > :years. While I would like to see regular mbufs given a callback
> > :mechanism, your present approach of using an mbuf cluster
> > :solves 90% of the problem.
> > :
> > :   Kirk  McKusick
> > 
> > ... Aflred, be careful that you don't break things we only just fixed
> > last year.  The descriptor passing code has been broken for many years.
> > 
> > I think the reason we have to scan the descriptor list is related to
> > locating isolated self-referential 'loops' with descriptor passing and
> > unix domain sockets and closing them.  e.g. when you pass a descriptor
> > for a unix-domain socket through a unix-domain socket, it is possible
> > for the socket descriptors to reference each other and thus never have
> > their ref count drop to 0 even when all associated processes have
> > close()'d.  This happens all the time.  Be sure you don't break the
> > fix that solves that particular problem.
> 
> Ok, I'll see if that can happen.  Basically since the reference
> never goes to zero on the socket, the buffers are never forced to
> be flushed/cleared and the mbuf will then never be free'd resulting
> it it leaking itself.  Basically a socket hanging there with an
> mbuf referencing itself.
> 
> I wonder if Linux fixed/has this problem.

Ok, my patch has this problem:

void
parent(int con)
{
int fd;

fd = open("/tmp/wank", O_RDONLY);
send_fd_withdata(con, con, "wank", 4);
sleep (5);
 exit(1);

}

void 
child(int con)
{
int fd, error;
charbuf[100];

sleep(5);
get_fd_withdata(con, &fd, buf, sizeof(buf));
send_fd_withdata(con, fd, "foo", 3);
exit(1);
buf[4] = '\0';
printf("%s\n", buf);
if ((error = read(fd, buf, sizeof(buf))) < 0)
perror("read");
buf[sizeof(buf)-1] = '\0';
printf("%s\n", buf);



}   

This causes a leak, I think the trick is to just always call sorflush()
when the pcb is free'd.

Looking at linux they still are using gc.  I'll give this a lot
more thought before resubmitting this idea.

sorry,
-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Bill Fumerola

On Wed, Dec 13, 2000 at 02:42:53PM -0500, Richard A. Steenbergen wrote:

> It could just as easily be a SYN flood against a single port... or a large
> number of clients trying to connected to your crashed web server... :P Or
> it could just as easily be an ack flood against a port without a listener
> and be showing up in the "not the ack flood" counter.

Exactly. Bikeshedding the millions of possible reasons the queue/ratelimit
was triggered is silly.

Bosko, please change the descriptions to something very generic before
committing them ("ratelimiting TCP RST packets: x/y pps" or something)

-- 
Bill Fumerola - security yahoo / Yahoo! inc.
  - [EMAIL PROTECTED] / [EMAIL PROTECTED]





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Tony Finch

Brian Somers <[EMAIL PROTECTED]> wrote:
>
>Hmm, the last time i looked at this, I believe the whole thing was 
>dealt with by not increasing the file descriptor reference count 
>when it was put in the message header.  If process A closed the 
>descriptor before process B actually recvmsg()d it, it would be 
>EBADF.  The recvmsg() actually incremented the reference count.

But it has always been documented behaviour that the receiving process
gets a valid descriptor even if the sender closes it directly after
sendmsging it. If this was not the case then descriptor handoff would
require an "ok" reply from the receiving process before the sender
could close it, which is a pain.

Hmm, the only references for this I can think of are Stevens and the
red & black daemon books, but I'm sure I've read a good discussion of
it somewhere else.

Tony.
-- 
f.a.n.finch[EMAIL PROTECTED][EMAIL PROTECTED]
"And remember my friend, future events such
as these will affect you in the future."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Alfred Perlstein

> This causes a leak, I think the trick is to just always call sorflush()
> when the pcb is free'd.

Even this doesn't work.

> 
> Looking at linux they still are using gc.  I'll give this a lot
> more thought before resubmitting this idea.

Ok, the problem is you have 3 af_unix pairs all open between 2
processes

process B sends 3 over 2 to A
process B sends 2 over 3 to A
process B send 2 and 3 over 1 to A
process B closes 1 2 and 3
A then closes 3 2 and then 1

closing 3 and 2 doesn't cause the socketbuffer to be flushed because
they are still self referencing.

closing 1 causes the socketbuffer to be flushed,

on flushing it comes across 2 and drops a reference but doesn't flush,
it then hits 3 and drops a reference but doesn't flush.

since 3 references 2 and 2 references 3 and nothing else references
2 or 3, we just leaked 2 and 3.

I guess the gc has to stay.

dammit. :)

My apologies for wasting everyone's time here.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Bosko Milekic


On Wed, 13 Dec 2000, Bill Fumerola wrote:

> On Wed, Dec 13, 2000 at 02:42:53PM -0500, Richard A. Steenbergen wrote:
> 
> > It could just as easily be a SYN flood against a single port... or a large
> > number of clients trying to connected to your crashed web server... :P Or
> > it could just as easily be an ack flood against a port without a listener
> > and be showing up in the "not the ack flood" counter.
> 
> Exactly. Bikeshedding the millions of possible reasons the queue/ratelimit
> was triggered is silly.
> 
> Bosko, please change the descriptions to something very generic before
> committing them ("ratelimiting TCP RST packets: x/y pps" or something)

Mike said he would do it and re-post the diff.

> -- 
> Bill Fumerola - security yahoo / Yahoo! inc.
>   - [EMAIL PROTECTED] / [EMAIL PROTECTED]

  Later,
  Bosko Milekic
  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Tony Finch

Alfred Perlstein <[EMAIL PROTECTED]> wrote:
>
>I guess the gc has to stay.
>
>dammit. :)
>
>My apologies for wasting everyone's time here.

``One day a student came to Moon and said: "I understand how to make a
better garbage collector. We must keep a reference count of the
pointers to each cons."

``Moon patiently told the student the following story:

``"One day a student came to Moon and said: `I understand how to make
a better garbage collector...'"''

:-)

Tony.
-- 
f.a.n.finch[EMAIL PROTECTED][EMAIL PROTECTED]
"Perhaps on your way home you will pass someone in the dark,
and you will never know it, for they will be from outer space."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Brian Somers

> Brian Somers <[EMAIL PROTECTED]> wrote:
> >
> >Hmm, the last time i looked at this, I believe the whole thing was 
> >dealt with by not increasing the file descriptor reference count 
> >when it was put in the message header.  If process A closed the 
> >descriptor before process B actually recvmsg()d it, it would be 
> >EBADF.  The recvmsg() actually incremented the reference count.
> 
> But it has always been documented behaviour that the receiving process
> gets a valid descriptor even if the sender closes it directly after
> sendmsging it. If this was not the case then descriptor handoff would
> require an "ok" reply from the receiving process before the sender
> could close it, which is a pain.
> 
> Hmm, the only references for this I can think of are Stevens and the
> red & black daemon books, but I'm sure I've read a good discussion of
> it somewhere else.

I've just looked back through my archives... the problem I'm thinking 
of was a different problem - where the descriptor passed was the only 
descriptor open for a tty whose pgrp was that of process A.  A passed 
the descriptor to B and then exited at which point the tty 
(correctly) revoked all it's remaining descriptors (the one en-route 
or in process B).

There's no way to avoid this - except by having A fork(), the child 
close the descriptor and continue where it left off and the parent 
pause() waiting for a signal from B to tell it that it's finished 
with that tty.

This is why I implemented ``enable keep-session'' :-)

> Tony.
> -- 
> f.a.n.finch[EMAIL PROTECTED][EMAIL PROTECTED]
> "And remember my friend, future events such
> as these will affect you in the future."

Cheers.

-- 
Brian <[EMAIL PROTECTED]>
     
Don't _EVER_ lose your sense of humour !




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Matt Dillon


:I guess the gc has to stay.
:
:dammit. :)
:
:My apologies for wasting everyone's time here.
:
:-- 
:-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

No waste at all, Alfred, the file descriptor passing code had been
broken for over 10 years precisely because of its complexity.  Rewriting
the GC to be more efficient essentially requires using deep graph theory
to locate isolated loops of arbitrary complexity.

p.s. many object oriented language garbage collectors have the same
problem.  create object A, create object B, A references B, B references A,
drop A, drop B.  A and B still have references and don't get cleaned up.
Fun.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Alfred Perlstein

* Matt Dillon <[EMAIL PROTECTED]> [001213 17:25] wrote:
> 
> :I guess the gc has to stay.
> :
> :dammit. :)
> :
> :My apologies for wasting everyone's time here.
> :
> :-- 
> :-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
> 
> No waste at all, Alfred, the file descriptor passing code had been
> broken for over 10 years precisely because of its complexity.  Rewriting
> the GC to be more efficient essentially requires using deep graph theory
> to locate isolated loops of arbitrary complexity.
> 
> p.s. many object oriented language garbage collectors have the same
> problem.  create object A, create object B, A references B, B references A,
> drop A, drop B.  A and B still have references and don't get cleaned up.
> Fun.

Are you saying the code in place is broken?  If so I'll spend some
time looking at it and the Linux implementation to figure if at
least one of us gets it right and try to find some sort of solution.

Obviously the easiest way would be to disallow passing of any
descriptors that have descriptors in thier socketbuffers.

Since almost no one uses this code, and I hardly see a reason for
allowing that type of operation (passing af_unix fds with fds in
flight) it might be a good idea to just disallow that sort of
operation.

It would definetly simplify and probably speed up the code.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Matt Dillon


:> No waste at all, Alfred, the file descriptor passing code had been
:
:Are you saying the code in place is broken?  If so I'll spend some
:time looking at it and the Linux implementation to figure if at
:least one of us gets it right and try to find some sort of solution.

No, *had*, not *has*.  It isn't broken, just inefficient in certain
cases due to the brute-force GC.

:Obviously the easiest way would be to disallow passing of any
:descriptors that have descriptors in thier socketbuffers.
:
:Since almost no one uses this code, and I hardly see a reason for
:allowing that type of operation (passing af_unix fds with fds in
:flight) it might be a good idea to just disallow that sort of
:operation.
:
:It would definetly simplify and probably speed up the code.

There's no reason to disallow that.  Besides, any socket
can be listen()'d after having been queued, so you aren't really
preventing anything.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Matt Dillon


:Hmm, the last time i looked at this, I believe the whole thing was 
:dealt with by not increasing the file descriptor reference count 
:when it was put in the message header.  If process A closed the 
:descriptor before process B actually recvmsg()d it, it would be 
:EBADF.  The recvmsg() actually incremented the reference count.
:
:I always assumed that this was a result of not wanting to have any 
:funny garbage collecting code ?
:
:Of course looking at the code now, it increases fp->f_count in 
:unp_internalize(), so maybe I was on drugs then

 Yes, you were on drugs :-)  The moment the descriptor is 
 queued to the socket, the ref count is bumped.  It's really
 a pointer to the underlying file that is queued (and whos
 ref count is bumped), not the descriptor number.

:Assuming I wasn't on drugs (maybe the behaviour was changed - cvs 
:annotate suggests some activity around March 9, but that was the 
:file*/int alignment stuff), I think it would be valid to go back 
:to the old behaviour (not increasing f_count and letting the original 
:owner actually close the descriptor while f_msgcount != 0).
:
:Does any of this make sense ?  Or am I just describing a different 
:case (where process B doesn't exit) ?

Well, it sort of makes sense, but only in a very twisted fashion :-).
sendmsg semantics are that once you queue the descriptor, you can
indeed close() it without destroying the queued entity.  Think about
it...  if this were not the case you would be forced to synchronize with
the receiving process prior to closing the descriptor on your end to
guarentee its validity on the receiving end, which would be a
non-trivial piece of userland programming.

If we did have those semantics then you could in fact throw the 
in-flight message away when the sending process went away, but you
would have to implement a secondary ref count to prevent the system
from throwing away the underlying file until you could clear the
message(s).  

We have *NEVER* had those semantics nor would we ever want those 
semantics.  Even if it were legal (which it isn't), it makes the
user-level programming too complex and too fragile.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: patch to cleanup inflight desciptor handling.

2000-12-13 Thread Tony Finch

Matt Dillon <[EMAIL PROTECTED]> wrote:
>
>No waste at all, Alfred, the file descriptor passing code had been
>broken for over 10 years precisely because of its complexity.  Rewriting
>the GC to be more efficient essentially requires using deep graph theory
>to locate isolated loops of arbitrary complexity.

Most efficient GCers don't involve much graph theory (the notable
exception is concurrent collectors); instead they rely on various
strategies to drastically reduce the proportion of the arena that they
need to examine in most GC runs. In principle mark-sweep collectors
are as simple as they get, but unp_gc suffers from the interaction
with refcounting.

You can use the idea of scanning less of the arena to improve unp_gc
as follows. I suggest that you keep two additional lists: one of open
unix domain sockets, and one of in-flight sockets. Instead of the
existing breadth-first search of the whole file table at the start of
unp_gc, it should first clear the mark on each descriptor on the
in-flight list, then do a depth-first search of all the descriptors
reachable from the unix domain sockets list, marking each one. The
loop after the big comment in unp_gc should then scan the in-flight
list looking for unmarked descriptors instead of the whole file table.
The descriptor freeing loops stay as they are now.

I think this should solve the problem at hand, i.e. a lock being held
on an important resource while something complicated is being done;
instead you would hold locks on two much less important lists (the
unix domain list and the in-flight list).

>p.s. many object oriented language garbage collectors have the same
>problem.  create object A, create object B, A references B, B references A,
>drop A, drop B.  A and B still have references and don't get cleaned up.
>Fun.

Most modern GCers don't use reference counting partly for that reason,
and partly because the overhead of maintaining reference counts is too
great.

Tony.
-- 
f.a.n.finch[EMAIL PROTECTED][EMAIL PROTECTED]
"Well, as long as they can think we'll have our problems.
But those whom we're using cannot think."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Ratelimint Enhancement patch (Please Review One Last Time!)

2000-12-13 Thread Bill Fumerola

On Wed, Dec 13, 2000 at 09:35:40PM -0500, Bosko Milekic wrote:

> > Bosko, please change the descriptions to something very generic before
> > committing them ("ratelimiting TCP RST packets: x/y pps" or something)
> 
>   Mike said he would do it and re-post the diff.

Excellent.

-- 
Bill Fumerola - security yahoo / Yahoo! inc.
  - [EMAIL PROTECTED] / [EMAIL PROTECTED]





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message