CARP and L2 src-MAC
Hi. We have a situation where we want to use CARP in a TPSDA-network and got some problems. The master CARP router ARP response contains the correct virtual MAC but uses the physical interface MAC in L2 header. This is OK for the client but the switches in between the router and the client will not learn the virtual MAC. This will work in a ³normal² switched network but will fail in a TPSDA network where all L2 devices will not learn the virtual MAC. In this case the network is built upon Alcatel iSAM FTTU and because all CARP-messages is broadcast they will not learn the virtual MAC. Is it possible to tweak CARP to use the virtual MAC in L2 header instead of the physical interface MAC? Could this be implemented as a feature controlled by a sysctl? //Jon ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: CARP performance tuning question.
Whilst I don't doubt that you have a problem, your comments don't correlate particularly well with the data you have provided and this makes it difficult to immediately suggest a solution. On 2008-Nov-05 16:40:32 +0300, pluknet <[EMAIL PROTECTED]> wrote: >AT work we use device carp(4) under high load: carp(4) is solely a failover mechanism. It either generates or receives somewhat under 1pps per carp interface and the state it maintains is basically 'master' or 'backup'. I suspect the 'load' is being caused by pf(4), possibly in conjunction with pfsync(4). >The problem is that the server experiences a bad interactivity (from >70k states and very bad from 120-150k) >i.e. when a network workload (and interrupts count) begin to increase. > >>From top(1): >CPU states: 0.0% user, 0.0% nice, 0.4% system, 76.3% interrupt, 23.3% idle > PID USERNAMETHR PRI NICE SIZERES STATETIME WCPU COMMAND > 13 root 1 -44 -163 0K 8K WAIT 407:43 57.86% swi1: net I agree that swi1 is using a significant amount of CPU but top is still reporting >23% idle so you shouldn't be getting poor interactive performance. >ATM pfctl -s info shows such numbers: > >State Table Total Rate > current entries 153972 > searches 6052078938 4800.8/s > inserts120373545 95.5/s > removals 120219573 95.4/s That shows the load on pf(4) but doesn't really reflect what the system is doing as a whole. >It works currently under UP, but could be rebuilt to work under SMP >(Xeon 5130) if that helps. Unfortunately, I don't know if this will help or not because I'm not sure what bottleneck you are hitting. >Can someone give hints to decrease interrupt count and to help with >the server stability at all? Well, you haven't actually reported what the interrupt count or what instability you are seeing so this is a bit difficult. Can you please provide some more information: - output from 'uname -a' - output from 'vmstat -i; sleep 10; vmstat -i' under load - output from 'netstat -i' - 10-15 seconds of output from 'netstat -i 1' under load - What is the box doing? Is it a straight filtering router? Does it handle NAT? Is it running apps itself (eg web, ftp, mail)? - What speed are the interface(s) running at? - What instability problems are you seeing? - Please provide more details on what you mean by 'bad interactivity'. - How complex is your pf ruleset? How many rules? Anything unusual? - What scheduler are you using? - What is the full output of 'pfctl -s info'? -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpLULZODpu1a.pgp Description: PGP signature
Re: CARP performance tuning question.
2008/11/6 Peter Jeremy <[EMAIL PROTECTED]>: > Whilst I don't doubt that you have a problem, your comments don't > correlate particularly well with the data you have provided and > this makes it difficult to immediately suggest a solution. > > On 2008-Nov-05 16:40:32 +0300, pluknet <[EMAIL PROTECTED]> wrote: >>AT work we use device carp(4) under high load: > > carp(4) is solely a failover mechanism. It either generates or receives > somewhat under 1pps per carp interface and the state it maintains is > basically 'master' or 'backup'. I suspect the 'load' is being caused > by pf(4), possibly in conjunction with pfsync(4). > >>The problem is that the server experiences a bad interactivity (from >>70k states and very bad from 120-150k) >>i.e. when a network workload (and interrupts count) begin to increase. >> >>>From top(1): >>CPU states: 0.0% user, 0.0% nice, 0.4% system, 76.3% interrupt, 23.3% idle >> PID USERNAMETHR PRI NICE SIZERES STATETIME WCPU COMMAND >> 13 root 1 -44 -163 0K 8K WAIT 407:43 57.86% swi1: net > > I agree that swi1 is using a significant amount of CPU but top is > still reporting >23% idle so you shouldn't be getting poor interactive > performance. > >>ATM pfctl -s info shows such numbers: >> >>State Table Total Rate >> current entries 153972 >> searches 6052078938 4800.8/s >> inserts120373545 95.5/s >> removals 120219573 95.4/s > > That shows the load on pf(4) but doesn't really reflect what the > system is doing as a whole. > >>It works currently under UP, but could be rebuilt to work under SMP >>(Xeon 5130) if that helps. > > Unfortunately, I don't know if this will help or not because I'm not > sure what bottleneck you are hitting. > >>Can someone give hints to decrease interrupt count and to help with >>the server stability at all? > > Well, you haven't actually reported what the interrupt count or > what instability you are seeing so this is a bit difficult. > > Can you please provide some more information: > - output from 'uname -a' > - output from 'vmstat -i; sleep 10; vmstat -i' under load > - output from 'netstat -i' > - 10-15 seconds of output from 'netstat -i 1' under load > - What is the box doing? Is it a straight filtering router? Does it > handle NAT? Is it running apps itself (eg web, ftp, mail)? > - What speed are the interface(s) running at? > - What instability problems are you seeing? > - Please provide more details on what you mean by 'bad interactivity'. > - How complex is your pf ruleset? How many rules? Anything unusual? > - What scheduler are you using? > - What is the full output of 'pfctl -s info'? > Thanks for your answer and, please, ignore this premature mail. It would need a bit more analysis. -- wbr, pluknet ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: CARP and L2 src-MAC
On 2008-11-06 11.47, "Peter Jeremy" <[EMAIL PROTECTED]> wrote: > On 2008-Nov-06 10:06:21 +0100, Jon Otterholm > <[EMAIL PROTECTED]> wrote: >> Is it possible to tweak CARP to use the virtual MAC in L2 header instead of >> the physical interface MAC? Could this be implemented as a feature >> controlled by a sysctl? > > In my testing, Max Laier's carpdep patches do this. See > http://lists.freebsd.org/pipermail/freebsd-net/2008-March/017103.html Can we find this in HEAD? //Jon ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
BPF question
Hello all, I am using simple write() calls to send packets over BPF file descriptor. The BPF file descriptor is in buffered read mode (I assume this is the default and I do not set it explicitly). From what I see my write() calls are somewhat buffered. Since timing is relatively important for my project I'd like to ask if there is a way "flush" the write buffer. Setting O_DIRECT flag on the file descriptor doesn't seem to have any effect. /ipv -- "UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity." Dennis Ritchie ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Kernel without INET
Hi, some might have wondered about lots of small commits I have done the last two days. I had been trying to compile a kernel without any networking a few weeks ago and that failed; I had needed to add (I think it was) INET, ether and loop. So I had been trying to get rid of that requirement the last days. As a partial victory it seems to be possible again to build a kernel without any networking now. I'll have to check with my original setup but I have a stripped down LINT file I tested with. Obviously the long term goal is to be able to build a kernel without INET support (again?). As an intermediate step that will mean without INET and INET6 and once that works and IPX only would compile *cough*, then work on a (LINT) kernel with nooption INET. It'll be a long long way to go and this is nothing to finish within a week or two. Do not think about doing a quick sweep over the rest of the tree. You would wonder what depends on INET these days. I have more patches mailed out or pending here. While we had been trying to make it possible to build without INET6 most of the time, someone doing review on my code told me that if compaining about 'kernel needs INET' I should put some code under #ifdef INET. I did. The bottom line is that I now ask you to consider this for all new code as well. I am very well aware that some code, as is, would already require a maze of #ifdefs (I have a sample of that) so we need to be careful and apply the checks sensibly. Regards, Bjoern PS: please obey Reply-To: -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: BPF question
On Thu, 6 Nov 2008, Ivo Vachkov wrote: I am using simple write() calls to send packets over BPF file descriptor. The BPF file descriptor is in buffered read mode (I assume this is the default and I do not set it explicitly). From what I see my write() calls are somewhat buffered. Since timing is relatively important for my project I'd like to ask if there is a way "flush" the write buffer. Setting O_DIRECT flag on the file descriptor doesn't seem to have any effect. The write(2) system call does no buffering in userspace (unlike, say, fwrite(3)), and when you write to a BPF device it essentially goes straight into the network interface output queue, so there should be no need for a flush mechanism. Could you describe the buffering effect you're seeing a bit more? Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: BPF question
I use following code: /* Send Announce Packet */ int zc_freebsd_sendannounce(int fd, unsigned char *mac, int zc_addr) { unsigned char *announce = NULL; int i = 0; unsigned int packet_len = 0; struct ether_header *eth_hdr = NULL; struct ether_arp *eth_arp = NULL; if (mac == NULL || zc_addr == 0 || zc_addr == -1) return -1; packet_len = sizeof(struct ether_header) + (sizeof(struct ether_arp) >= ETHER_PAYLOAD ? sizeof(struct ether_arp) : ETHER_PAYLOAD); /* Allocate announce packet */ if ((announce = malloc(packet_len)) == NULL) return -1; memset(announce, 0, packet_len); /* Populate Announce Packet * * eth_hdr * saddr - iface mac * daddr - ff:ff:ff:ff:ff:ff * type = ETHERTYPE_ARP * * eth_arp - ARP REQUEST * sender hw addr - iface mac * sender ip addr - zc_addr * target hw addr - 00:00:00:00:00:00 * target ip addr - zc_addr */ eth_hdr = (struct ether_header *)announce; eth_arp = (struct ether_arp *)((char *)eth_hdr + sizeof(struct ether_header)); memcpy(eth_hdr->ether_dhost, eth_bcast_addr, ETHER_ADDR_LEN); memcpy(eth_hdr->ether_shost, mac, ETHER_ADDR_LEN); eth_hdr->ether_type = htons(ETHERTYPE_ARP); eth_arp->arp_hrd = htons(ARPHRD_ETHER); eth_arp->arp_pro = htons(ETHERTYPE_IP); eth_arp->arp_hln = ETHER_ADDR_LEN; eth_arp->arp_pln = IP_ADDR_LEN; eth_arp->arp_op = htons(ARPOP_REQUEST); memcpy(eth_arp->arp_sha, mac, ETHER_ADDR_LEN); memcpy(eth_arp->arp_spa, &zc_addr, IP_ADDR_LEN); memcpy(eth_arp->arp_tha, eth_null_addr, ETHER_ADDR_LEN); memcpy(eth_arp->arp_tpa, &zc_addr, IP_ADDR_LEN); /* Send packet over the wire */ if ((i = write(fd, announce, packet_len)) < 0) { free(announce); return -1; } free(announce); return 0; } and later in my code i call this function in a loop: for (i = 0; i < ANNOUNCE_NUM; i++) { printf("ANNOUNCE ...\n"); fflush(stdout); /* Get initial time */ if (clock_gettime(CLOCK_REALTIME, &ts0) < 0) { perror("clock_gettime"); return -1; } /* Send Announce Packet */ if (zc_freebsd_sendannounce(bpf_fd, mac, zc_addr) < 0) { printf("zc_freebsd_sendannounce(): error\n"); return -1; } /* Possibly check for conflicts here */ /* Get time after select() */ if (clock_gettime(CLOCK_REALTIME, &ts1) < 0) { perror("clock_gettime"); return -1; } printf("ts0.sec |%ld|, ts0.nsec |%ld|\n", ts0.tv_sec, ts0.tv_nsec); fflush(stdout); printf("ts1.sec |%ld|, ts1.nsec |%ld|\n", ts1.tv_sec, ts1.tv_nsec); fflush(stdout); /* wait ANNOUNCE_INTERVAL or the rest of it */ ts0.tv_sec = ANNOUNCE_INTERVAL - (ts1.tv_sec - ts0.tv_sec) >= 0 ? ANNOUNCE_INTERVAL - (ts1.tv_sec - ts0.tv_sec) : 0; ts0.tv_nsec = ((ANNOUNCE_INTERVAL - ts0.tv_sec) * 10) - (ts1.tv_nsec - ts0.tv_nsec) >= 0 ? ((ANNOUNCE_INTERVAL - ts0.tv_sec) * 10) - (ts1.tv_nsec - ts0.tv_nsec) : 0; nanosleep(&ts0, NULL); } /* ANNOUNCE_NUM for() */ >From the two printf()'s above i see the nanosleep() is effective. However, when I check the packet flow with Wireshark (on the same host where this code is running) I see the announce packets timed only miliseconds away from one another. Could this be an issue with Wireshark ?! Right now I have only one computer to work on, but i'll test the timing from another computer asap. P.S. I'm implementing part of RFC3927 (ZeroConf) as part of a bigger project On Thu, Nov 6, 2008 at 7:06 PM, Robert Watson <[EMAIL PROTECTED]> wrote: > > On Thu, 6 Nov 2008, Ivo Vachkov wrote: > >> I am using simple write() calls to send packets over BPF file descriptor. >> The BPF file descriptor is in buffered read mode (I assume this is the >> default and I do not set it explicitly). From what I see my wri
Re: BPF question
Just a side note. Thu, Nov 06, 2008 at 07:54:13PM +0200, Ivo Vachkov wrote: > P.S. I'm implementing part of RFC3927 (ZeroConf) as part of a bigger project Had you glanced at /usr/ports/net/howl and may be /usr/ports/net/avahi? -- Eygene ____ _.--. # \`.|\.....-'` `-._.-'_.-'` # Remember that it is hard / ' ` , __.--' # to read the on-line manual )/' _/ \ `-_, /# while single-stepping the kernel. `-'" `"\_ ,_.-;_.-\_ ', fsc/as # _.-'_./ {_.' ; / #-- FreeBSD Developers handbook {_.-``-' {_/# pgpbw1hsCqsPF.pgp Description: PGP signature
Re: BPF question
I "evaluated" Avahi, but it is too big for my needs. I will check howl too. However Zeroconf seems relatively easy to implement, plus i need this module to work in cooperation with others. The License does matter too :) On Thu, Nov 6, 2008 at 8:14 PM, Eygene Ryabinkin <[EMAIL PROTECTED]> wrote: > Just a side note. > > Thu, Nov 06, 2008 at 07:54:13PM +0200, Ivo Vachkov wrote: >> P.S. I'm implementing part of RFC3927 (ZeroConf) as part of a bigger project > > Had you glanced at /usr/ports/net/howl and may be /usr/ports/net/avahi? > -- > Eygene > ____ _.--. # > \`.|\.....-'` `-._.-'_.-'` # Remember that it is hard > / ' ` , __.--' # to read the on-line manual > )/' _/ \ `-_, /# while single-stepping the kernel. > `-'" `"\_ ,_.-;_.-\_ ', fsc/as # > _.-'_./ {_.' ; / #-- FreeBSD Developers handbook >{_.-``-' {_/# > -- "UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity." Dennis Ritchie ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Two copies of resolver routines in libc ?
i was recently re-looking at the problem mentioned in http://lists.freebsd.org/pipermail/freebsd-hackers/2003-August/002399.html (bogus dns servers on my ISP, telecomitalia, which takes forever to resolve queries, coupled with the absence, on the FreeBSD resolver has no way to disable queries when IPV6 is compiled in, which happens with GENERIC kernels). While looking for a workaround (attached, read later), i noticed that libc has two versions of the resolver routines: one is in /usr/src/lib/libc/resolv/res_query.c the other one is embedded into /usr/src/lib/libc/net/getaddrinfo.c which includes a slightly modified version of res_nquery, res_ndots, res_nquerydomain (all parts of the routines documented in resolver(3)). If we are lucky, this is just replicated code. But i am not even sure they are the same, e.g. in the handling of options (in resolv.conf or the environment variable RES_OPTIONS). This is really annoying, because generally you don't know if an application uses getaddrinfo() or the traditional gethost*() routines (which in turn use resolver(3)), so it is hard to tell whether applications have a consistent behaviour. If someone has time, it would be worthwhile trying to merge the two versions of the code into one (and i believe we should make getaddrinfo use the standard stuff in resolv/ --- As for a fix to my problem: --- i wanted some trick to disable, in the resolver, the generation of queries. resolver(5) mentions some options that can be put in /etc/resolv.conf or in the RES_OPTIONS environment variable, to control the behaviour of the resolver. Some more options are undocumented but implemented, e.g. looking at /usr/src/lib/libc/resolv/res_init.c you find these additional options: retrans: retry: inet6 insecure1 insecure2 rotate no-check-names edns0 dname nibble: nibble2: v6revmode: The code below (which is completely trivial) add an additional option, "no", which disables the generation of requests. Just do setenv RES_OPTIONS no and you are done. I don't know of other ways to disable these requests on normal address resolutions, other than build a kernel without INET6. As you see below (and this relates to my original complaint), i had to make the modification in two places :( because things like ssh and telnet use getaddrinfo(), whereas e.g. firefox uses res_query(). I have no idea what is used by /usr/bin/host or /usr/bin/dig , because they do not seem to use any of the library routines. Any interest to have this into the system ? cheers luigi Index: net/getaddrinfo.c === RCS file: /home/ncvs/src/lib/libc/net/getaddrinfo.c,v retrieving revision 1.69.2.10 diff -u -r1.69.2.10 getaddrinfo.c --- net/getaddrinfo.c 28 Sep 2007 06:23:03 - 1.69.2.10 +++ net/getaddrinfo.c 6 Nov 2008 20:35:39 - @@ -85,6 +85,7 @@ #include #include "res_config.h" +#include "res_private.h" #ifdef DEBUG #include @@ -2257,6 +2258,8 @@ oflags = res->_flags; + if (res->options & RES_NO && type == ns_t_) + continue; /* ignore this request */ again: hp->rcode = NOERROR;/* default */ Index: resolv/res_init.c === RCS file: /home/ncvs/src/lib/libc/resolv/res_init.c,v retrieving revision 1.2.2.3 diff -u -r1.2.2.3 res_init.c --- resolv/res_init.c 22 Dec 2006 07:33:20 - 1.2.2.3 +++ resolv/res_init.c 6 Nov 2008 20:34:00 - @@ -636,6 +636,8 @@ !strncmp(cp, "no-tld-query", sizeof("no-tld-query") - 1)) { statp->options |= RES_NOTLDQUERY; + } else if (!strncmp(cp, "no", sizeof("no") - 1)) { + statp->options |= RES_NO; } else if (!strncmp(cp, "inet6", sizeof("inet6") - 1)) { statp->options |= RES_USE_INET6; } else if (!strncmp(cp, "insecure1", sizeof("insecure1") - 1)) { Index: resolv/res_private.h === RCS file: /home/ncvs/src/lib/libc/resolv/res_private.h,v retrieving revision 1.1.1.1.2.1 diff -u -r1.1.1.1.2.1 res_private.h --- resolv/res_private.h17 Jul 2006 10:09:58 - 1.1.1.1.2.1 +++ resolv/res_private.h6 Nov 2008 19:08:29 - @@ -3,6 +3,9 @@ #ifndef res_private_h #define res_private_h +// additional debug flags to disable queries +#define RES_NO 0x0080 + struct __res_state_ext { union res_sockaddr_union nsaddrs[MAXNS]; struct sort_list { Index: resolv/res_query.c === RCS file: /home/ncvs/src/lib/libc/resolv/res
Re: CARP and L2 src-MAC
On 2008-Nov-06 10:06:21 +0100, Jon Otterholm <[EMAIL PROTECTED]> wrote: >Is it possible to tweak CARP to use the virtual MAC in L2 header instead of >the physical interface MAC? Could this be implemented as a feature >controlled by a sysctl? In my testing, Max Laier's carpdep patches do this. See http://lists.freebsd.org/pipermail/freebsd-net/2008-March/017103.html -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpZRD0UHjVFm.pgp Description: PGP signature
sysctl net.inet.tcp.syncache.count
% sysctl net.inet.tcp.syncache net.inet.tcp.syncache.rst_on_sock_fail: 1 net.inet.tcp.syncache.rexmtlimit: 3 net.inet.tcp.syncache.hashsize: 1024 net.inet.tcp.syncache.count: -84 net.inet.tcp.syncache.cachelimit: 102400 net.inet.tcp.syncache.bucketlimit: 100 Why number of entries in syncache is negative? % uname -srp FreeBSD 7.1-PRERELEASE amd64 -- Anton Yuzhaninov ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"