date:20081106

CARP and L2 src-MAC

2008-11-06 Thread Jon Otterholm

Hi.

We have a situation where we want to use CARP in a TPSDA-network and got
some problems. 

The master CARP router ARP response contains the correct virtual MAC but
uses the physical interface MAC in L2 header. This is OK for the client but
the switches in between the router and the client will not learn the virtual
MAC. This will work in a ³normal² switched network but will fail in a TPSDA
network where all L2 devices will not learn the virtual MAC. In this case
the network is built upon Alcatel iSAM FTTU and because all CARP-messages is
broadcast they will not learn the virtual MAC.

Is it possible to tweak CARP to use the virtual MAC in L2 header instead of
the physical interface MAC? Could this be implemented as a feature
controlled by a sysctl?

//Jon
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: CARP performance tuning question.

2008-11-06 Thread Peter Jeremy

Whilst I don't doubt that you have a problem, your comments don't
correlate particularly well with the data you have provided and
this makes it difficult to immediately suggest a solution.

On 2008-Nov-05 16:40:32 +0300, pluknet <[EMAIL PROTECTED]> wrote:
>AT work we use device carp(4) under high load:

carp(4) is solely a failover mechanism.  It either generates or receives
somewhat under 1pps per carp interface and the state it maintains is
basically 'master' or 'backup'.  I suspect the 'load' is being caused
by pf(4), possibly in conjunction with pfsync(4).

>The problem is that the server experiences a bad interactivity (from
>70k states and very bad from 120-150k)
>i.e. when a network workload (and interrupts count) begin to increase.
>
>>From top(1):
>CPU states:  0.0% user,  0.0% nice,  0.4% system, 76.3% interrupt, 23.3% idle
>  PID USERNAMETHR PRI NICE   SIZERES STATETIME   WCPU COMMAND
>  13 root  1 -44 -163 0K 8K WAIT   407:43 57.86% swi1: net

I agree that swi1 is using a significant amount of CPU but top is
still reporting >23% idle so you shouldn't be getting poor interactive
performance.

>ATM pfctl -s info shows such numbers:
>
>State Table  Total Rate
>  current entries   153972
>  searches  6052078938 4800.8/s
>  inserts120373545   95.5/s
>  removals   120219573   95.4/s

That shows the load on pf(4) but doesn't really reflect what the
system is doing as a whole.

>It works currently under UP, but could be rebuilt to work under SMP
>(Xeon 5130) if that helps.

Unfortunately, I don't know if this will help or not because I'm not
sure what bottleneck you are hitting.

>Can someone give hints to decrease interrupt count and to help with
>the server stability at all?

Well, you haven't actually reported what the interrupt count or
what instability you are seeing so this is a bit difficult.

Can you please provide some more information:
- output from 'uname -a'
- output from 'vmstat -i; sleep 10; vmstat -i' under load
- output from 'netstat -i'
- 10-15 seconds of output from 'netstat -i 1' under load
- What is the box doing? Is it a straight filtering router?  Does it
  handle NAT?  Is it running apps itself (eg web, ftp, mail)?
- What speed are the interface(s) running at?
- What instability problems are you seeing?
- Please provide more details on what you mean by 'bad interactivity'.
- How complex is your pf ruleset?  How many rules?  Anything unusual?
- What scheduler are you using?
- What is the full output of 'pfctl -s info'?

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.

pgpLULZODpu1a.pgp
Description: PGP signature

Re: CARP performance tuning question.

2008-11-06 Thread pluknet

2008/11/6 Peter Jeremy <[EMAIL PROTECTED]>:
> Whilst I don't doubt that you have a problem, your comments don't
> correlate particularly well with the data you have provided and
> this makes it difficult to immediately suggest a solution.
>
> On 2008-Nov-05 16:40:32 +0300, pluknet <[EMAIL PROTECTED]> wrote:
>>AT work we use device carp(4) under high load:
>
> carp(4) is solely a failover mechanism.  It either generates or receives
> somewhat under 1pps per carp interface and the state it maintains is
> basically 'master' or 'backup'.  I suspect the 'load' is being caused
> by pf(4), possibly in conjunction with pfsync(4).
>
>>The problem is that the server experiences a bad interactivity (from
>>70k states and very bad from 120-150k)
>>i.e. when a network workload (and interrupts count) begin to increase.
>>
>>>From top(1):
>>CPU states:  0.0% user,  0.0% nice,  0.4% system, 76.3% interrupt, 23.3% idle
>>  PID USERNAMETHR PRI NICE   SIZERES STATETIME   WCPU COMMAND
>>  13 root  1 -44 -163 0K 8K WAIT   407:43 57.86% swi1: net
>
> I agree that swi1 is using a significant amount of CPU but top is
> still reporting >23% idle so you shouldn't be getting poor interactive
> performance.
>
>>ATM pfctl -s info shows such numbers:
>>
>>State Table  Total Rate
>>  current entries   153972
>>  searches  6052078938 4800.8/s
>>  inserts120373545   95.5/s
>>  removals   120219573   95.4/s
>
> That shows the load on pf(4) but doesn't really reflect what the
> system is doing as a whole.
>
>>It works currently under UP, but could be rebuilt to work under SMP
>>(Xeon 5130) if that helps.
>
> Unfortunately, I don't know if this will help or not because I'm not
> sure what bottleneck you are hitting.
>
>>Can someone give hints to decrease interrupt count and to help with
>>the server stability at all?
>
> Well, you haven't actually reported what the interrupt count or
> what instability you are seeing so this is a bit difficult.
>
> Can you please provide some more information:
> - output from 'uname -a'
> - output from 'vmstat -i; sleep 10; vmstat -i' under load
> - output from 'netstat -i'
> - 10-15 seconds of output from 'netstat -i 1' under load
> - What is the box doing? Is it a straight filtering router?  Does it
>  handle NAT?  Is it running apps itself (eg web, ftp, mail)?
> - What speed are the interface(s) running at?
> - What instability problems are you seeing?
> - Please provide more details on what you mean by 'bad interactivity'.
> - How complex is your pf ruleset?  How many rules?  Anything unusual?
> - What scheduler are you using?
> - What is the full output of 'pfctl -s info'?
>

Thanks for your answer and, please, ignore this premature mail.
It would need a bit more analysis.

-- 
wbr,
pluknet
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: CARP and L2 src-MAC

2008-11-06 Thread Jon Otterholm


On 2008-11-06 11.47, "Peter Jeremy" <[EMAIL PROTECTED]> wrote:

> On 2008-Nov-06 10:06:21 +0100, Jon Otterholm
> <[EMAIL PROTECTED]> wrote:
>> Is it possible to tweak CARP to use the virtual MAC in L2 header instead of
>> the physical interface MAC? Could this be implemented as a feature
>> controlled by a sysctl?
> 
> In my testing, Max Laier's carpdep patches do this.  See
> http://lists.freebsd.org/pipermail/freebsd-net/2008-March/017103.html

Can we find this in HEAD?

//Jon

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

BPF question

2008-11-06 Thread Ivo Vachkov

Hello all,

I am using simple write() calls to send packets over BPF file
descriptor. The BPF file descriptor is in buffered read mode (I assume
this is the default and I do not set it explicitly). From what I see
my write() calls are somewhat buffered. Since timing is relatively
important for my project I'd like to ask if there is a way "flush" the
write buffer. Setting O_DIRECT flag on the file descriptor doesn't
seem to have any effect.

/ipv

-- 
"UNIX is basically a simple operating system, but you have to be a
genius to understand the simplicity." Dennis Ritchie
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Kernel without INET

2008-11-06 Thread Bjoern A. Zeeb


Hi,

some might have wondered about lots of small commits I have done the
last two days.

I had been trying to compile a kernel without any networking a few
weeks ago and that failed; I had needed to add (I think it was) INET,
ether and loop.

So I had been trying to get rid of that requirement the last days.
As a partial victory it seems to be possible again to build a kernel
without any networking now. I'll have to check with my original setup
but I have a stripped down LINT file I tested with.

Obviously the long term goal is to be able to build a kernel without
INET support (again?). As an intermediate step that will mean without
INET and INET6 and once that works and IPX only would compile *cough*,
then work on a (LINT) kernel with nooption INET.

It'll be a long long way to go and this is nothing to finish within a
week or two.  Do not think about doing a quick sweep over the rest
of the tree.  You would wonder what depends on INET these days. I have
more patches mailed out or pending here.

While we had been trying to make it possible to build without INET6
most of the time, someone doing review on my code told me that if
compaining about 'kernel needs INET' I should put some code under
#ifdef INET. I did.

The bottom line is that I now ask you to consider this for all new
code as well.

I am very well aware that some code, as is, would already require a
maze of #ifdefs (I have a sample of that) so we need to be careful
and apply the checks sensibly.


Regards,
Bjoern

PS: please obey Reply-To:

--
Bjoern A. Zeeb  Stop bit received. Insert coin for new game.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: BPF question

2008-11-06 Thread Robert Watson



On Thu, 6 Nov 2008, Ivo Vachkov wrote:

I am using simple write() calls to send packets over BPF file descriptor. 
The BPF file descriptor is in buffered read mode (I assume this is the 
default and I do not set it explicitly). From what I see my write() calls 
are somewhat buffered. Since timing is relatively important for my project 
I'd like to ask if there is a way "flush" the write buffer. Setting O_DIRECT 
flag on the file descriptor doesn't seem to have any effect.


The write(2) system call does no buffering in userspace (unlike, say, 
fwrite(3)), and when you write to a BPF device it essentially goes straight 
into the network interface output queue, so there should be no need for a 
flush mechanism.  Could you describe the buffering effect you're seeing a bit 
more?


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: BPF question

2008-11-06 Thread Ivo Vachkov

I use following code:

/* Send Announce Packet */
int zc_freebsd_sendannounce(int fd, unsigned char *mac, int zc_addr) {
unsigned char *announce = NULL;

int i = 0;
unsigned int packet_len = 0;

struct ether_header *eth_hdr = NULL;
struct ether_arp *eth_arp = NULL;

if (mac == NULL || zc_addr == 0 || zc_addr == -1)
return -1;

packet_len = sizeof(struct ether_header) + (sizeof(struct ether_arp)
>= ETHER_PAYLOAD ?

sizeof(struct ether_arp) : ETHER_PAYLOAD);

/* Allocate announce packet */
if ((announce = malloc(packet_len)) == NULL)
return -1;

memset(announce, 0, packet_len);

/* Populate Announce Packet
 *
 * eth_hdr
 * saddr - iface mac
 * daddr - ff:ff:ff:ff:ff:ff
 * type = ETHERTYPE_ARP
 *
 * eth_arp - ARP REQUEST
 * sender hw addr - iface mac
 * sender ip addr - zc_addr
 * target hw addr - 00:00:00:00:00:00
 * target ip addr - zc_addr
 */

eth_hdr = (struct ether_header *)announce;
eth_arp = (struct ether_arp *)((char *)eth_hdr + sizeof(struct 
ether_header));

memcpy(eth_hdr->ether_dhost, eth_bcast_addr, ETHER_ADDR_LEN);
memcpy(eth_hdr->ether_shost, mac, ETHER_ADDR_LEN);
eth_hdr->ether_type = htons(ETHERTYPE_ARP);

eth_arp->arp_hrd = htons(ARPHRD_ETHER);
eth_arp->arp_pro = htons(ETHERTYPE_IP);
eth_arp->arp_hln = ETHER_ADDR_LEN;
eth_arp->arp_pln = IP_ADDR_LEN;
eth_arp->arp_op = htons(ARPOP_REQUEST);

memcpy(eth_arp->arp_sha, mac, ETHER_ADDR_LEN);
memcpy(eth_arp->arp_spa, &zc_addr, IP_ADDR_LEN);
memcpy(eth_arp->arp_tha, eth_null_addr, ETHER_ADDR_LEN);
memcpy(eth_arp->arp_tpa, &zc_addr, IP_ADDR_LEN);

/* Send packet over the wire */
if ((i = write(fd, announce, packet_len)) < 0) {
free(announce);
return -1;
}

free(announce);
return 0;
}

and later in my code i call this function in a loop:

for (i = 0; i < ANNOUNCE_NUM; i++) {
printf("ANNOUNCE ...\n"); fflush(stdout);

/* Get initial time */
if (clock_gettime(CLOCK_REALTIME, &ts0) < 0) {
perror("clock_gettime");
return -1;
}

/* Send Announce Packet */
if (zc_freebsd_sendannounce(bpf_fd, mac, 
zc_addr) < 0) {
printf("zc_freebsd_sendannounce(): 
error\n");
return -1;
}

/* Possibly check for conflicts here */

/* Get time after select() */
if (clock_gettime(CLOCK_REALTIME, &ts1) < 0) {
perror("clock_gettime");
return -1;
}

printf("ts0.sec |%ld|, ts0.nsec |%ld|\n", 
ts0.tv_sec,
ts0.tv_nsec); fflush(stdout);
printf("ts1.sec |%ld|, ts1.nsec |%ld|\n", 
ts1.tv_sec,
ts1.tv_nsec); fflush(stdout);

/* wait ANNOUNCE_INTERVAL or the rest of it */
ts0.tv_sec = ANNOUNCE_INTERVAL - (ts1.tv_sec - 
ts0.tv_sec) >= 0 ?
ANNOUNCE_INTERVAL - (ts1.tv_sec - 
ts0.tv_sec) : 0;
ts0.tv_nsec = ((ANNOUNCE_INTERVAL - ts0.tv_sec) 
* 10) -
(ts1.tv_nsec - ts0.tv_nsec) >= 0 ?
((ANNOUNCE_INTERVAL - ts0.tv_sec) * 
10) - (ts1.tv_nsec -
ts0.tv_nsec) : 0;
nanosleep(&ts0, NULL);
} /* ANNOUNCE_NUM for() */

>From the two printf()'s above i see the nanosleep() is effective.
However, when I check the packet flow with Wireshark (on the same host
where this code is running) I see the announce packets timed only
miliseconds away from one another. Could this be an issue with
Wireshark ?! Right now I have only one computer to work on, but i'll
test the timing from another computer asap.

P.S. I'm implementing part of RFC3927 (ZeroConf) as part of a bigger project


On Thu, Nov 6, 2008 at 7:06 PM, Robert Watson <[EMAIL PROTECTED]> wrote:
>
> On Thu, 6 Nov 2008, Ivo Vachkov wrote:
>
>> I am using simple write() calls to send packets over BPF file descriptor.
>> The BPF file descriptor is in buffered read mode (I assume this is the
>> default and I do not set it explicitly). From what I see my wri

Re: BPF question

2008-11-06 Thread Eygene Ryabinkin

Just a side note.

Thu, Nov 06, 2008 at 07:54:13PM +0200, Ivo Vachkov wrote:
> P.S. I'm implementing part of RFC3927 (ZeroConf) as part of a bigger project

Had you glanced at /usr/ports/net/howl and may be /usr/ports/net/avahi?
-- 
Eygene
 ____   _.--.   #
 \`.|\.....-'`   `-._.-'_.-'`   #  Remember that it is hard
 /  ' ` ,   __.--'  #  to read the on-line manual   
 )/' _/ \   `-_,   /#  while single-stepping the kernel.
 `-'" `"\_  ,_.-;_.-\_ ',  fsc/as   #
 _.-'_./   {_.'   ; /   #-- FreeBSD Developers handbook 
{_.-``-' {_/#


pgpbw1hsCqsPF.pgp
Description: PGP signature

Re: BPF question

2008-11-06 Thread Ivo Vachkov

I "evaluated" Avahi, but it is too big for my needs. I will check howl
too. However Zeroconf seems relatively easy to implement, plus i need
this module to work in cooperation with others. The License does
matter too :)

On Thu, Nov 6, 2008 at 8:14 PM, Eygene Ryabinkin <[EMAIL PROTECTED]> wrote:
> Just a side note.
>
> Thu, Nov 06, 2008 at 07:54:13PM +0200, Ivo Vachkov wrote:
>> P.S. I'm implementing part of RFC3927 (ZeroConf) as part of a bigger project
>
> Had you glanced at /usr/ports/net/howl and may be /usr/ports/net/avahi?
> --
> Eygene
>  ____   _.--.   #
>  \`.|\.....-'`   `-._.-'_.-'`   #  Remember that it is hard
>  /  ' ` ,   __.--'  #  to read the on-line manual
>  )/' _/ \   `-_,   /#  while single-stepping the kernel.
>  `-'" `"\_  ,_.-;_.-\_ ',  fsc/as   #
> _.-'_./   {_.'   ; /   #-- FreeBSD Developers handbook
>{_.-``-' {_/#
>



-- 
"UNIX is basically a simple operating system, but you have to be a
genius to understand the simplicity." Dennis Ritchie
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Two copies of resolver routines in libc ?

2008-11-06 Thread Luigi Rizzo

i was recently re-looking at the problem mentioned in

http://lists.freebsd.org/pipermail/freebsd-hackers/2003-August/002399.html

(bogus dns servers on my ISP, telecomitalia, which takes forever
to resolve  queries, coupled with the absence, on the FreeBSD
resolver has no way to disable  queries when IPV6 is compiled
in, which happens with GENERIC kernels).

While looking for a workaround (attached, read later), i noticed
that libc has two versions of the resolver routines: one is in

/usr/src/lib/libc/resolv/res_query.c

the other one is embedded into
/usr/src/lib/libc/net/getaddrinfo.c

which includes a slightly modified version of res_nquery, res_ndots,
res_nquerydomain (all parts of the routines documented in resolver(3)).

If we are lucky, this is just replicated code.
But i am not even sure they are the same, e.g. in the handling of
options (in resolv.conf or the environment variable RES_OPTIONS).

This is really annoying, because generally you don't know if an
application uses getaddrinfo() or the traditional gethost*() routines
(which in turn use resolver(3)), so it is hard to tell whether
applications have a consistent behaviour.
If someone has time, it would be worthwhile trying to merge
the two versions of the code into one (and i believe we should
make getaddrinfo use the standard stuff in resolv/


--- As for a fix to my problem: ---

i wanted some trick to disable, in the resolver, the generation of
 queries. resolver(5) mentions some options that can be put
in /etc/resolv.conf or in the RES_OPTIONS environment variable,
to control the behaviour of the resolver.
Some more options are undocumented but implemented, e.g. looking
at /usr/src/lib/libc/resolv/res_init.c you find these additional
options:

retrans:
retry:
inet6
insecure1
insecure2
rotate
no-check-names
edns0
dname

nibble:
nibble2:
v6revmode:

The code below (which is completely trivial) add an additional option,
"no", which disables the generation of  requests. Just do

setenv RES_OPTIONS no

and you are done. I don't know of other ways to disable these
requests on normal address resolutions, other than build a kernel
without INET6.

As you see below (and this relates to my original complaint),
i had to make the modification in two places :( because things like
ssh and telnet use getaddrinfo(), whereas e.g. firefox uses res_query().

I have no idea what is used by /usr/bin/host or /usr/bin/dig ,
because they do not seem to use any of the library routines.

Any interest to have this into the system ?

cheers
luigi

Index: net/getaddrinfo.c
===
RCS file: /home/ncvs/src/lib/libc/net/getaddrinfo.c,v
retrieving revision 1.69.2.10
diff -u -r1.69.2.10 getaddrinfo.c
--- net/getaddrinfo.c   28 Sep 2007 06:23:03 -  1.69.2.10
+++ net/getaddrinfo.c   6 Nov 2008 20:35:39 -
@@ -85,6 +85,7 @@
 #include 
 
 #include "res_config.h"
+#include "res_private.h"
 
 #ifdef DEBUG
 #include 
@@ -2257,6 +2258,8 @@
 
oflags = res->_flags;
 
+   if (res->options & RES_NO && type == ns_t_)
+   continue;   /* ignore this request */
 again:
hp->rcode = NOERROR;/* default */
 
Index: resolv/res_init.c
===
RCS file: /home/ncvs/src/lib/libc/resolv/res_init.c,v
retrieving revision 1.2.2.3
diff -u -r1.2.2.3 res_init.c
--- resolv/res_init.c   22 Dec 2006 07:33:20 -  1.2.2.3
+++ resolv/res_init.c   6 Nov 2008 20:34:00 -
@@ -636,6 +636,8 @@
   !strncmp(cp, "no-tld-query",
sizeof("no-tld-query") - 1)) {
statp->options |= RES_NOTLDQUERY;
+   } else if (!strncmp(cp, "no", sizeof("no") - 1)) {
+   statp->options |= RES_NO;
} else if (!strncmp(cp, "inet6", sizeof("inet6") - 1)) {
statp->options |= RES_USE_INET6;
} else if (!strncmp(cp, "insecure1", sizeof("insecure1") - 1)) {
Index: resolv/res_private.h
===
RCS file: /home/ncvs/src/lib/libc/resolv/res_private.h,v
retrieving revision 1.1.1.1.2.1
diff -u -r1.1.1.1.2.1 res_private.h
--- resolv/res_private.h17 Jul 2006 10:09:58 -  1.1.1.1.2.1
+++ resolv/res_private.h6 Nov 2008 19:08:29 -
@@ -3,6 +3,9 @@
 #ifndef res_private_h
 #define res_private_h
 
+// additional debug flags to disable  queries
+#define RES_NO 0x0080
+
 struct __res_state_ext {
union res_sockaddr_union nsaddrs[MAXNS];
struct sort_list {
Index: resolv/res_query.c
===
RCS file: /home/ncvs/src/lib/libc/resolv/res

Re: CARP and L2 src-MAC

2008-11-06 Thread Peter Jeremy

On 2008-Nov-06 10:06:21 +0100, Jon Otterholm <[EMAIL PROTECTED]> wrote:
>Is it possible to tweak CARP to use the virtual MAC in L2 header instead of
>the physical interface MAC? Could this be implemented as a feature
>controlled by a sysctl?

In my testing, Max Laier's carpdep patches do this.  See
http://lists.freebsd.org/pipermail/freebsd-net/2008-March/017103.html

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpZRD0UHjVFm.pgp
Description: PGP signature

sysctl net.inet.tcp.syncache.count

2008-11-06 Thread Anton Yuzhaninov


% sysctl net.inet.tcp.syncache
net.inet.tcp.syncache.rst_on_sock_fail: 1
net.inet.tcp.syncache.rexmtlimit: 3
net.inet.tcp.syncache.hashsize: 1024
net.inet.tcp.syncache.count: -84
net.inet.tcp.syncache.cachelimit: 102400
net.inet.tcp.syncache.bucketlimit: 100

Why number of entries in syncache is negative?

% uname -srp
FreeBSD 7.1-PRERELEASE amd64

--
 Anton Yuzhaninov
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

CARP and L2 src-MAC

Re: CARP performance tuning question.

Re: CARP performance tuning question.

Re: CARP and L2 src-MAC

BPF question

Kernel without INET

Re: BPF question

Re: BPF question

Re: BPF question

Re: BPF question

Two copies of resolver routines in libc ?

Re: CARP and L2 src-MAC

sysctl net.inet.tcp.syncache.count

13 matches

Site Navigation

Mail list logo

Footer information