Re: reading routing table

2008-09-02 Thread Julian Elischer

Bruce M. Simpson wrote:

Debarshi Ray wrote:

...
I was going through the FreeBSD and NetBSD documentation and the
FreeBSD sources of netstat and route. I was suprised to see that while
NetBSD's route implementation has a 'show' command, FreeBSD does not
offer any such thing. Moreover it seems that one can not read the
entire routing table using the PF_ROUTE sockets and RTM_GET returns
information pertaining to only one destination. This suprised me
because one can do such a thing with the Linux kernel's RTNETLINK.

Is there a reason why this is so? Or is reading from /dev/kmem the
only way to get a dump of the routing tables?
  


You want 'netstat -rn' to dump them, this is a very common command which 
should be present in a number of online resources on using and 
administering FreeBSD so I am somewhat surprised that you didn't find it.


P.S. Look in the sysctl tree if you need to snapshot the kernel IP 
forwarding tables. You can use kmem, but it is generally frowned upon 
unless you're working from core dumps -- kernels can be built without 
kmem support, or kmem locked down, etc.


unfortunatly netstat -rn uses /dev/kmem

we've just never got around to implementing a better interface..



cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reading routing table

2008-09-02 Thread Debarshi Ray
> unfortunatly netstat -rn uses /dev/kmem

Yes. I also found that FreeBSD's route(8) implementation does not have
an equivalent of 'netstat -r'. NetBSD and GNU/Linux implementations
have such an option. Any reason for this? Is it because you did not
want to muck with /dev/kmem in route(8) and wanted it to work with
PF_ROUTE only? I have not yet gone through NetBSD's route(8) code
though.

Happy hacking,
Debarshi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: reading routing table

2008-09-02 Thread Robert Watson

On Tue, 2 Sep 2008, Debarshi Ray wrote:


unfortunatly netstat -rn uses /dev/kmem


Yes. I also found that FreeBSD's route(8) implementation does not have an 
equivalent of 'netstat -r'. NetBSD and GNU/Linux implementations have such 
an option. Any reason for this? Is it because you did not want to muck with 
/dev/kmem in route(8) and wanted it to work with PF_ROUTE only? I have not 
yet gone through NetBSD's route(8) code though.


Usually the "reason" for things like this is that no one has written the code 
to do otherwise :-).  PF_ROUTE is probably not a good mechanism for any bulk 
data transfer due to the constraints of being a datagram socket, although 
doing it via an interated dump rather than a simple dump operation would 
probably work.  Sysctl is generally a better interface for monitoring for 
various reasona, although it also has limitations.  Maintaining historic kmem 
support is important, since it is also the code used for interpreting core 
dumps, and we don't want to lose support for that.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


how to read dynamic data structures from the kernel (was Re: reading routing table)

2008-09-02 Thread Luigi Rizzo
in the (short so far) thread which i am hijacking, the issue came
out of what is a good mechanism for reading the route table from
the kernel, since FreeBSD currently uses /dev/kmem and this is not
always available/easy to use with dynamically changing data structures.

The routing table is only one instance of potentially many similar
data structures that we might want to fetch - others are the various
firewall tables (the output of 'ipfw show'), possibly bridging
tables, socket lists and so on.

The issue is actually twofold.

The interface problem, or how to pull bits from the kernel, is so
easy to be almost irrelevant -- getsockopt, sysctl, kmem, or some
special file descriptor does the job as long as the underlying chunk
of data does not change (or can be locked) during the syscall.

The real problem is that these data structures are dynamic and
potentially large, so the following approach (used e.g. in ipfw)

enter kernel;
get shared lock on the structure;
navigate through the structure and make a linearized copy;
unlock;
copyout the linearized copy;

is extremely expensive and has the potential to block other activities
for a long time.

Accessing /dev/kmem and follow pointers there has probably the risk
that you cannot lock the kernel data structure while you navigate
on it, so you are likely to follow stale pointers.

What we'd need is some internal representation of the data structure
that could give us individual entries of the data structure on each
call, together with extra info (a pointer if we can guarantee that
it doesn't get stale, something more if we cannot make the guarantee)
to allow the navigation to occur.

I believe this is a very old and common problem, so my question is:

do you know if any of the *BSD kernels implements some good mechanism
to access a dynamic kernel data structure (e.g. the routing tree/trie,
or even a list or hash table) without the flaws of the two approaches
i indicate above ?

cheers
luigi

[original thread below just for reference, but i believe i made a
fair summary above]

On Tue, Sep 02, 2008 at 10:19:55AM +0100, Robert Watson wrote:
> On Tue, 2 Sep 2008, Debarshi Ray wrote:
> 
> >>unfortunatly netstat -rn uses /dev/kmem
> >
> >Yes. I also found that FreeBSD's route(8) implementation does not have an 
> >equivalent of 'netstat -r'. NetBSD and GNU/Linux implementations have such 
> >an option. Any reason for this? Is it because you did not want to muck 
> >with /dev/kmem in route(8) and wanted it to work with PF_ROUTE only? I 
> >have not yet gone through NetBSD's route(8) code though.
> 
> Usually the "reason" for things like this is that no one has written the 
> code to do otherwise :-).  PF_ROUTE is probably not a good mechanism for 
> any bulk data transfer due to the constraints of being a datagram socket, 
> although doing it via an interated dump rather than a simple dump operation 
> would probably work.  Sysctl is generally a better interface for monitoring 
> for various reasona, although it also has limitations.  Maintaining 
> historic kmem support is important, since it is also the code used for 
> interpreting core dumps, and we don't want to lose support for that.
> 
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: how to read dynamic data structures from the kernel (was Re: reading routing table)

2008-09-02 Thread Bruce M. Simpson

Luigi Rizzo wrote:

do you know if any of the *BSD kernels implements some good mechanism
to access a dynamic kernel data structure (e.g. the routing tree/trie,
or even a list or hash table) without the flaws of the two approaches
i indicate above ?
  


Hahaha. I ran into an isomorphic problem with Net-SNMP at work last week.

   There's a need to export the BGP routing table via SNMP. Of course 
doing this in our framework at work requires some IPC calls which always 
require a select() (or WaitForMultipleObjects()) based continuation.
   Net-SNMP doesn't support continuations at the table iterator level, 
so somehow, we need to implement an iterator which can accomodate our 
blocking IPC mechanism.


  [No, we don't use threads, and that would actually create more 
problems than it solves -- running single-threaded with continuations 
lets us run lock free, and we rely on the OS's IPC primitives to 
serialize our code. works just fine for us so far...]


   So we would end up caching the whole primary key range in the SNMP 
sub-agent on a table OID access, a technique which would allow us to 
defer the IPC calls providing we walk the entire range of the iterator 
and cache the keys -- but even THAT is far too much data for the BGP 
table, which is a trie with ~250,000 entries. I hate SNMP GETNEXT.


   Back to the FreeBSD kernel, though.

   If you look at in_mcast.c, particularly in p4 bms_netdev, this is 
what happens for the per-socket multicast source filters -- there is the 
linearization of an RB-tree for setsourcefilter().
   This is fine for something with a limit of ~256 entries per socket 
(why RB for something so small? this is for space vs time -- and also it 
has to merge into a larger filter list in the IGMPv3 paths.)
   And the lock granularity is per-socket. However it doesn't do for 
something as big as a BGP routing table.


   C++ lends itself well to expressing these kinds of smart-pointer 
idioms, though.
   I'm thinking perhaps we need the notion of a sysctl iterator, which 
allocates a token for walking a shared data structure, and is able to 
guarantee that the token maps to a valid pointer for the same entry, 
until its 'advance pointer' operation is called.


Question is, who's going to pull the trigger?

cheers
BMS

P.S. I'm REALLY getting fed up with the lack of openness and 
transparency largely incumbent in doing work in p4.


Come one come all -- we shouldn't need accounts for folk to see and 
contribute what's going on, and the stagnation is getting silly. FreeBSD 
development should not be a committer or chum-of-committer in-crowd.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Quagga OSPF binds to wrong interface on FreeBSD 7

2008-09-02 Thread Joe Greco
Joining this conversation as someone who's been wrestling with this issue
for some months:

> > This bug was reported around the release of FreeBSD 7, but does not seem
> > to have made any progress. 
> > 
> > http://bugzilla.quagga.net/show_bug.cgi?id=420
> > 
> > Is this because the sockopt.c.diff patch is correct, which isn't entirely
> > clear from the following comments, or is there some other solution to this
> > problem?  Thanks!
> 
> You should contact with ports/net/quagga maintainer to push
> temporary patch into Ports Tree until quagga developers settle with
> something working. This always was most productive way for us.

I've been doing extremely limited testing on the sockopt.c patch, on a 
7.0R box that used to have problems, and it "seems to" work.  However,
the failures we were noticing seemed most frequent and catastrophic 
when using a 7.0R box as a router with about a dozen interfaces active
(we got instant failures, in many/most/all?? cases).  I don't have a
lab setup capable of reproducing this at the moment, and am not willing
to sacrifice production networks to the "well just try it and see" 
patch testing god.

I believe the question that was asked is not the question you answered.  
I, too, would like someone who can offer a knowledgeable opinion as to
the correctness of the patch.  Were someone who has worked on the code,
such as Bruce, to tell me that it appeared to be the right solution, I
would be willing to risk a test on a 7.0R box known to fall over rapidly
with the multicast issue.  I am certainly interested in seeing this
fixed.

Until someone can either test the heck out of this, or offer a definitive
opinion of the correctness based on experience with this subsystem, it
would seem premature to ask the port maintainer to include a patch of
dubious correctness.

I have cc:'d bms@ in the hopes of bringing in such an opinion.  I am not
sure who else is working on the multicast subsystem at this time, but
hopefully someone else can input if appropriate.

Knowing that the patch was correct would also provide some leverage for
those of us with interest in this code to persuade the Quagga developers
to do something about this.  As it is, we're left here holding a bag of
"this patch is supposed to work but we don't really know it is correct."
So it would be really useful to have such an opinion.

Thanks,

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: how to read dynamic data structures from the kernel (was Re: reading routing table)

2008-09-02 Thread Robert Watson


On Tue, 2 Sep 2008, Luigi Rizzo wrote:

The real problem is that these data structures are dynamic and potentially 
large, so the following approach (used e.g. in ipfw)


enter kernel;
get shared lock on the structure;
navigate through the structure and make a linearized copy;
unlock;
copyout the linearized copy;

is extremely expensive and has the potential to block other activities for a 
long time.


Sockets, sysctl, kmem, etc, are all really just I/O mechanisms, with varying 
levels of abstraction, for pushing data, and all fundamentally suffer from the 
problem of a lack of general export abstraction.


What we'd need is some internal representation of the data structure that 
could give us individual entries of the data structure on each call, 
together with extra info (a pointer if we can guarantee that it doesn't get 
stale, something more if we cannot make the guarantee) to allow the 
navigation to occur.


I think there's necessarily implementation-specific details to all of these 
steps for any given kernel subsystem -- we have data structures, 
synchronization models, etc, that are all tuned to their common use 
requirements, and monitoring is very much an edge case.  I don't think this is 
bad: this is an OS kernel, after all, but it does make things a bit more 
tricky.  Even if we can't share code, sharing approaches across subsystems is 
a good idea.


For an example of what you have in mind, take a look at the sysctl monitoring 
for UNIX domain sockets.  First, we allocate an array of pointers sized to the 
number of unpcb's we have.  Then we walk the list, bumping the references and 
adding pointers to the array.  Then we release the global locks, and proceed 
lock, externalize, unlock, and copyout each individual entry, using a 
generation number fo manage staleness.  Finally, we walk the list dropping the 
refcounts and free the array.  This voids holding global locks for a long 
time, as well as the stale data issue.  It's unideal in other ways -- long 
lists, reference count complexity, etc, but as I mentioned, it is very much an 
edge case, and much of that mechanism (especially refcounts) is something we 
need anyway for any moderately complex kernel data structure.


Robert N M Watson
Computer Laboratory
University of Cambridge



Accessing /dev/kmem and follow pointers there has probably the risk
that you cannot lock the kernel data structure while you navigate
on it, so you are likely to follow stale pointers.

I believe this is a very old and common problem, so my question is:

do you know if any of the *BSD kernels implements some good mechanism
to access a dynamic kernel data structure (e.g. the routing tree/trie,
or even a list or hash table) without the flaws of the two approaches
i indicate above ?

cheers
luigi

[original thread below just for reference, but i believe i made a
fair summary above]

On Tue, Sep 02, 2008 at 10:19:55AM +0100, Robert Watson wrote:

On Tue, 2 Sep 2008, Debarshi Ray wrote:


unfortunatly netstat -rn uses /dev/kmem


Yes. I also found that FreeBSD's route(8) implementation does not have an
equivalent of 'netstat -r'. NetBSD and GNU/Linux implementations have such
an option. Any reason for this? Is it because you did not want to muck
with /dev/kmem in route(8) and wanted it to work with PF_ROUTE only? I
have not yet gone through NetBSD's route(8) code though.


Usually the "reason" for things like this is that no one has written the
code to do otherwise :-).  PF_ROUTE is probably not a good mechanism for
any bulk data transfer due to the constraints of being a datagram socket,
although doing it via an interated dump rather than a simple dump operation
would probably work.  Sysctl is generally a better interface for monitoring
for various reasona, although it also has limitations.  Maintaining
historic kmem support is important, since it is also the code used for
interpreting core dumps, and we don't want to lose support for that.

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: how to read dynamic data structures from the kernel (was Re: reading routing table)

2008-09-02 Thread Luigi Rizzo
On Tue, Sep 02, 2008 at 10:02:10PM +0100, Robert Watson wrote:
> 
> On Tue, 2 Sep 2008, Luigi Rizzo wrote:
> 
> >The real problem is that these data structures are dynamic and potentially 
> >large, so the following approach (used e.g. in ipfw)
> >
> > enter kernel;
> > get shared lock on the structure;
> > navigate through the structure and make a linearized copy;
> > unlock;
> > copyout the linearized copy;
> >
> >is extremely expensive and has the potential to block other activities for 
> >a long time.
> 
> Sockets, sysctl, kmem, etc, are all really just I/O mechanisms, with 
> varying levels of abstraction, for pushing data, and all fundamentally 
> suffer from the problem of a lack of general export abstraction.

yes, this is why i said we should not bother about which one is used.

> >What we'd need is some internal representation of the data structure that 
> >could give us individual entries of the data structure on each call, 
> >together with extra info (a pointer if we can guarantee that it doesn't 
> >get stale, something more if we cannot make the guarantee) to allow the 
> >navigation to occur.
> 
> I think there's necessarily implementation-specific details to all of these 
> steps for any given kernel subsystem -- we have data structures, 
> synchronization models, etc, that are all tuned to their common use 
> requirements, and monitoring is very much an edge case.  I don't think this 
> is bad: this is an OS kernel, after all, but it does make things a bit more 
> tricky.  Even if we can't share code, sharing approaches across subsystems 
> is a good idea.
> 
> For an example of what you have in mind, take a look at the sysctl 
> monitoring for UNIX domain sockets.  First, we allocate an array of 
> pointers sized to the number of unpcb's we have.  Then we walk the list, 
> bumping the references and adding pointers to the array.  Then we release 
> the global locks, and proceed lock, externalize, unlock, and copyout each 
> individual entry, using a generation number fo manage staleness.  Finally, 
> we walk the list dropping the refcounts and free the array.  This voids 
> holding global locks for a long time, as well as the stale data issue.  
> It's unideal in other ways -- long lists, reference count complexity, etc, 
> but as I mentioned, it is very much an edge case, and much of that 
> mechanism (especially refcounts) is something we need anyway for any 
> moderately complex kernel data structure.

what you describe above is more efficient but not that different
from what i described. The thing is, i always forget that in many
cases an iterator doesn't care for the order in which elements
are generated so you could use a solution like the one below,
by just doing a tiny little bit of work while modifying the main
data structure
(this might well be a known solution, since it is so trivial...)

[i already emailed the following to BMS, so apologies for the duplicate]

if all you care is iterating the whole data structure, without a
particular order, you could manage an additional array of pointers
to all the objects in the data structure (the array should be
implemented as a sparse, resizable array but that's a separate
issue, and probably a relatively trivial one -- i am googling for
it...).

Navigation and iterators are simple:

+ When inserting a new element, append an entry to the array, and make
  it point to the newly added record. Each entry gets a fresh sequence
  numbers, and one should make sure that seqnumbers are not recycled
  (64 bit should suffice ?).

+ when deleting an element, logically remove the entry from the array

+ the iterator returns a copy of the object, and its sequence number;

+ getnext returns the existing element following the 'current' one
  in the sparse array.

Complexity for most ops (data insertion, removal, lookup)
would be O(1) plus whatever is needed to do housekeeping on the
sparse array, and this depends on the usage of the main data structure
and how much we worry for expensive 'getnext' ops.
Sure you need a read lock on the main struct while you lookup
the next element on the sparse array, but the nice thing is that
you can release the lock at each step even if you have a poorly
implemented sparse array.

Makes sense ?

cheers
luigi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


avahi-daemon, Segmentation fault: 11 (core dumped)

2008-09-02 Thread stellan alm
Hi,

Running:
FreeBSD black 7.0-RELEASE-p2 FreeBSD 7.0-RELEASE-p2 #0: Wed Jun 18
07:33:20 UTC 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  i386

All the latest ports gnome2 and xfce4, output from gdb analysing the
core says:
--8<-
$ gdb -c avahi-daemon.core /usr/local/sbin/avahi-daemon 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-marcel-freebsd"...(no debugging symbols
found)...
Core was generated by `avahi-daemon'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/libavahi-common.so.3...(no debugging
symbols found)...done.
Loaded symbols for /usr/local/lib/libavahi-common.so.3
Reading symbols from /usr/local/lib/libavahi-core.so.5...(no debugging
symbols found)...done.
Loaded symbols for /usr/local/lib/libavahi-core.so.5
Reading symbols from /usr/local/lib/libdaemon.so.0...(no debugging
symbols found)...done.
Loaded symbols for /usr/local/lib/libdaemon.so.0
Reading symbols from /usr/local/lib/libexpat.so.6...(no debugging
symbols found)...done.
Loaded symbols for /usr/local/lib/libexpat.so.6
Reading symbols from /usr/local/lib/libdbus-1.so.3...(no debugging
symbols found)...done.
Loaded symbols for /usr/local/lib/libdbus-1.so.3
Reading symbols from /lib/libssp.so.0...(no debugging symbols
found)...done.
Loaded symbols for /lib/libssp.so.0
Reading symbols from /usr/local/lib/libintl.so.8...(no debugging symbols
found)...done.
Loaded symbols for /usr/local/lib/libintl.so.8
Reading symbols from /usr/local/lib/libiconv.so.3...(no debugging
symbols found)...done.
Loaded symbols for /usr/local/lib/libiconv.so.3
Reading symbols from /lib/libc.so.7...(no debugging symbols
found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols
found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x280a73f9 in _thr_cancel_enter ()
from /usr/local/lib/libavahi-common.so.3
[New LWP 100162]
(gdb) 
--8<-

net/avahi is compiled with:
$ less /var/db/ports/avahi/options 
# This file is auto-generated by 'make config'.
# No user-servicable parts inside!
# Options for avahi-0.6.23
_OPTIONS_READ=avahi-0.6.23
WITH_AUTOIPD=true
WITH_GTK=true
WITH_LIBDNS=true
WITHOUT_MONO=true
WITHOUT_QT3=true
WITHOUT_QT4=true
WITH_PYTHON=true

Searching the net doesn't come up with anything...
Removed all ports! Reinstalled but without luck.

Kind regards,
Stellan Alm


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: how to read dynamic data structures from the kernel (was Re: reading routing table)

2008-09-02 Thread Julian Elischer

Robert Watson wrote:


On Tue, 2 Sep 2008, Luigi Rizzo wrote:

The real problem is that these data structures are dynamic and 
potentially large, so the following approach (used e.g. in ipfw)


enter kernel;
get shared lock on the structure;
navigate through the structure and make a linearized copy;
unlock;
copyout the linearized copy;

is extremely expensive and has the potential to block other activities 
for a long time.


Sockets, sysctl, kmem, etc, are all really just I/O mechanisms, with 
varying levels of abstraction, for pushing data, and all fundamentally 
suffer from the problem of a lack of general export abstraction.


What we'd need is some internal representation of the data structure 
that could give us individual entries of the data structure on each 
call, together with extra info (a pointer if we can guarantee that it 
doesn't get stale, something more if we cannot make the guarantee) to 
allow the navigation to occur.




In some code I have seen (and some I have written) there is always two
levels of storage in some modules.. One that contains the majority
of the information and one that contains "changes that occured since
the main container was locked"..

so for example the routing tables might be locked and if
a routing change is requested thereafter, it gets stored in a 
transactional form on the side structure.. a routing lookup
during the period that the structure is locked (if a read lock) simply 
goes ahead, and at completion checks if there is a better

answer in the waiting list. A write request is stored as a
transaction request on the waiting list. not saying it works for 
everything but If we had a kernel written in a high enough level

language, such methods could be broadly used..  oh well.

using reader-writer locking mitigates a lot of this..


I think there's necessarily implementation-specific details to all of 
these steps for any given kernel subsystem -- we have data structures, 
synchronization models, etc, that are all tuned to their common use 
requirements, and monitoring is very much an edge case.  I don't think 
this is bad: this is an OS kernel, after all, but it does make things a 
bit more tricky.  Even if we can't share code, sharing approaches across 
subsystems is a good idea.


For an example of what you have in mind, take a look at the sysctl 
monitoring for UNIX domain sockets.  First, we allocate an array of 
pointers sized to the number of unpcb's we have.  Then we walk the list, 
bumping the references and adding pointers to the array.  Then we 
release the global locks, and proceed lock, externalize, unlock, and 
copyout each individual entry, using a generation number fo manage 
staleness.  Finally, we walk the list dropping the refcounts and free 
the array.  This voids holding global locks for a long time, as well as 
the stale data issue.  It's unideal in other ways -- long lists, 
reference count complexity, etc, but as I mentioned, it is very much an 
edge case, and much of that mechanism (especially refcounts) is 
something we need anyway for any moderately complex kernel data structure.


refcounts are, unfortunatly a really bad thing for multiprocessors.
refcounts, if they are actually incremented now and then are usually 
out of scope for any given CPU forcing lots of cache flushes and real 
reads from memory. There are some elaborate MP-refcount schemes we 
really should look at but most require a lot of memory.






___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/127050: [carp] ipv6 does not work on carp interfaces [regression]

2008-09-02 Thread linimon
Old Synopsis: ipv6 does not work on carp interfaces
New Synopsis: [carp] ipv6 does not work on carp interfaces [regression]

Responsible-Changed-From-To: freebsd-bugs->freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Tue Sep 2 22:37:07 UTC 2008
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=127050
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/127052: [if_bridge] Still bridge issues - with L2 protocols such as PPPoE

2008-09-02 Thread linimon
Old Synopsis: Still bridge issues - with L2 protocols such as PPPoE
New Synopsis: [if_bridge] Still bridge issues - with L2 protocols such as PPPoE

Responsible-Changed-From-To: freebsd-bugs->freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Tue Sep 2 22:46:19 UTC 2008
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=127052
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/127057: [udp] Unable to send UDP packet via IPv6 socket to IPv4 mapped address

2008-09-02 Thread linimon
Old Synopsis: Unable to send UDP packet via IPv6 socket to IPv4 mapped address
New Synopsis: [udp] Unable to send UDP packet via IPv6 socket to IPv4 mapped 
address

Responsible-Changed-From-To: freebsd-bugs->freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Wed Sep 3 03:18:48 UTC 2008
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=127057
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/127052: Still bridge issues - with L2 protocols such as PPPoE

2008-09-02 Thread Eygene Ryabinkin
The following reply was made to PR kern/127052; it has been noted by GNATS.

From: Eygene Ryabinkin <[EMAIL PROTECTED]>
To: Helge Oldach <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: kern/127052: Still bridge issues - with L2 protocols such as
PPPoE
Date: Wed, 3 Sep 2008 08:21:43 +0400

 --UNifc18z8z6e1QHx
 Content-Type: text/plain; charset=koi8-r
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 Tue, Sep 02, 2008 at 11:06:47PM +0200, Helge Oldach wrote:
 > Eygene supplied a patch that supposedly fixes this issue by introducing
 > a sysctl that makes the former if_bridge behaviour default, and which
 > must be turned on to enable MAC inheritance. I have not tested this
 > patch yet.
 
 And here is the patch itself:
 --- if_bridge-mac_inheritance.patch begins here ---
 =46rom 545d95995bb1879a6807be28a43d4ee061dda218 Mon Sep 17 00:00:00 2001
 =46rom: Eygene Ryabinkin <[EMAIL PROTECTED]>
 Date: Tue, 2 Sep 2008 19:49:44 +0400
 Subject: [PATCH] Add sysctl net.link.bridge.inherit_mac to control MAC inhe=
 ritance
 
 Philip Paeps enabled bridge to inherit its MAC from the first bridge
 member.  This broke ARP, it was fixed, but then Helge Oldach reported
 that this also brokes PPPoE when it is done on the bridged interface.
 
 I had implemented new sysctl that controls MAC inheritance.  It is off
 by default to enable previous behaviour of bridge until all problems
 with duplicated MAC addresses will be chased and fixed.
 
 Signed-off-by: Eygene Ryabinkin <[EMAIL PROTECTED]>
 ---
  sys/net/if_bridge.c |9 +++--
  1 files changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/sys/net/if_bridge.c b/sys/net/if_bridge.c
 index a84a0ff..aee7c4a 100644
 --- a/sys/net/if_bridge.c
 +++ b/sys/net/if_bridge.c
 @@ -350,6 +350,7 @@ static int pfil_ipfw_arp =3D 0;   /* layer2 filter with=
  ipfw */
  static int pfil_local_phys =3D 0; /* run pfil hooks on the physical interf=
 ace for
 locally destined packets */
  static int log_stp   =3D 0;   /* log STP state changes */
 +static int bridge_inherit_mac =3D 0;   /* share MAC with first bridge memb=
 er */
  SYSCTL_INT(_net_link_bridge, OID_AUTO, pfil_onlyip, CTLFLAG_RW,
  &pfil_onlyip, 0, "Only pass IP packets when pfil is enabled");
  SYSCTL_INT(_net_link_bridge, OID_AUTO, ipfw_arp, CTLFLAG_RW,
 @@ -363,6 +364,9 @@ SYSCTL_INT(_net_link_bridge, OID_AUTO, pfil_local_phys,=
  CTLFLAG_RW,
  "Packet filter on the physical interface for locally destined packets"=
 );
  SYSCTL_INT(_net_link_bridge, OID_AUTO, log_stp, CTLFLAG_RW,
  &log_stp, 0, "Log STP state changes");
 +SYSCTL_INT(_net_link_bridge, OID_AUTO, inherit_mac, CTLFLAG_RW,
 +&bridge_inherit_mac, 0,
 +"Inherit MAC address from the first bridge member");
 =20
  struct bridge_control {
int (*bc_func)(struct bridge_softc *, void *);
 @@ -921,7 +925,8 @@ bridge_delete_member(struct bridge_softc *sc, struct br=
 idge_iflist *bif,
 * the mac address of the bridge to the address of the next member, or
 * to its default address if no members are left.
 */
 -  if (!memcmp(IF_LLADDR(sc->sc_ifp), IF_LLADDR(ifs), ETHER_ADDR_LEN)) {
 +  if (bridge_inherit_mac &&
 +  !memcmp(IF_LLADDR(sc->sc_ifp), IF_LLADDR(ifs), ETHER_ADDR_LEN)) {
if (LIST_EMPTY(&sc->sc_iflist))
bcopy(sc->sc_defaddr,
IF_LLADDR(sc->sc_ifp), ETHER_ADDR_LEN);
 @@ -1028,7 +1033,7 @@ bridge_ioctl_add(struct bridge_softc *sc, void *arg)
 * member and the MAC address of the bridge has not been changed from
 * the default randomly generated one.
 */
 -  if (LIST_EMPTY(&sc->sc_iflist) &&
 +  if (bridge_inherit_mac && LIST_EMPTY(&sc->sc_iflist) &&
!memcmp(IF_LLADDR(sc->sc_ifp), sc->sc_defaddr, ETHER_ADDR_LEN))
bcopy(IF_LLADDR(ifs), IF_LLADDR(sc->sc_ifp), ETHER_ADDR_LEN);
 =20
 --=20
 1.5.6.4
 --- if_bridge-mac_inheritance.patch ends here ---
 
 > I wonder what the purpose of MAC inheritance is anyway... Multiple
 > unicast MACs in one segment sounds pretty odd.
 
 As was explained to me by Philip Paeps,
 -
 On 2008-08-15 18:24:29 (+0400), Eygene Ryabinkin <[EMAIL PROTECTED]> wro=
 te:
 > I wonder what was the real need of the commit r180140, where you added
 > preemption of first bridge member MAC address by the bridge itself?
 
 There were two reasons: firstly, it makes the bridge more predictable across
 reboots, particularly in setups using DHCP.  Secondly, this is the way the
 IEEE spec seems to suggest it should work.  It is also the way other bridgi=
 ng
 implementations I've encountered work -- which suggests my reading of the s=
 pec
 is correct.
 -
 --=20
 Eygene
  ____   _.--.   #
  \`.|\.....-'`   `-._.-'_.-'`   #  Remember that it is hard
  /  ' ` ,   __.--'  #  to read the on-line manual  =20
  )/' _/ \   `-_,