Re: kern/120304: [netgraph] [patch] netgraph source assumes 32-bit timeval on AMD64

2013-05-13 Thread Mitya
The following reply was made to PR kern/120304; it has been noted by GNATS.

From: Mitya 
To: bug-follo...@freebsd.org, m...@engarde.com
Cc:  
Subject: Re: kern/120304: [netgraph] [patch] netgraph source assumes 32-bit
 timeval on AMD64
Date: Mon, 13 May 2013 11:01:20 +0300

 This is a multi-part message in MIME format.
 --000802080404020703030804
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 7bit
 
 This is a more accuracy patch
 
 --000802080404020703030804
 Content-Type: text/plain; charset=UTF-8;
  name="ng_source.c.diff"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment;
  filename="ng_source.c.diff"
 
 LS0tIG5nX3NvdXJjZS5jLm9yaWcJMjAxMS0wOS0yMyAwMzo1MTozNy4wMDAwMDAwMDAgKzAz
 MDAKKysrIG5nX3NvdXJjZS5jCTIwMTMtMDUtMDMgMTQ6MTg6MTguMDAwMDAwMDAwICswMzAw
 CkBAIC0xMjYsOCArMTI2LDE2IEBACiAKIC8qIFBhcnNlIHR5cGUgZm9yIHRpbWV2YWwgKi8K
 IHN0YXRpYyBjb25zdCBzdHJ1Y3QgbmdfcGFyc2Vfc3RydWN0X2ZpZWxkIG5nX3NvdXJjZV90
 aW1ldmFsX3R5cGVfZmllbGRzW10gPSB7CisjaWYgZGVmaW5lZChfX0xQNjRfXykgfHwgZGVm
 aW5lZChfX2FybV9fKSB8fCBkZWZpbmVkKF9fbWlwc19fKQorCXsgInR2X3NlYyIsCQkmbmdf
 cGFyc2VfaW50NjRfdHlwZQl9LAorI2Vsc2UKIAl7ICJ0dl9zZWMiLAkJJm5nX3BhcnNlX2lu
 dDMyX3R5cGUJfSwKKyNlbmRpZgorI2lmIGRlZmluZWQoX19MUDY0X18pCisJeyAidHZfdXNl
 YyIsCQkmbmdfcGFyc2VfaW50NjRfdHlwZQl9LAorI2Vsc2UKIAl7ICJ0dl91c2VjIiwJCSZu
 Z19wYXJzZV9pbnQzMl90eXBlCX0sCisjZW5kaWYKIAl7IE5VTEwgfQogfTsKIGNvbnN0IHN0
 cnVjdCBuZ19wYXJzZV90eXBlIG5nX3NvdXJjZV90aW1ldmFsX3R5cGUgPSB7Cg==
 --000802080404020703030804--
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Replace bcopy() to update ether_addr

2012-08-20 Thread Mitya

Hi.
I found some overhead code in /src/sys/net/if_ethersubr.c and 
/src/sys/netgraph/ng_ether.c


It contains strings, like bcopy(src, dst, ETHER_ADDR_LEN);
When src and dst are "struct ether_addr*", and ETHER_ADDR_LEN equal 6.
This code call every time, when we send Ethernet packet.
On example, on my machine in invoked nearly 20K per second.

Why we are use bcopy(), to copy only 6 bytes?
Answer - in some architectures we are can not directly copy unaligned data.

I propose this solution.

In file /usr/src/include/net/ethernet.h add this lines:

static inline void ether_addr_copy(ether_addr* src, ether_addr* dst) {
#if defined(__i386__) || defined(__amd64__)
*dst = *src;
#else
bcopy(src, dst, ETHER_ADDR_LEN);
#endif
}

On platform i386 gcc produce like this code:
leal-30(%ebp), %eax
leal6(%eax), %ecx
leal-44(%ebp), %edx
movl(%edx), %eax
movl%eax, (%ecx)
movzwl  4(%edx), %eax
movw%ax, 4(%ecx)
And clang produce this:
movl-48(%ebp), %ecx
movl%ecx, -26(%ebp)
movw-44(%ebp), %si
movw%si, -22(%ebp)


All this variants are much faster, than bcopy()

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Replace bcopy() to update ether_addr

2012-08-20 Thread Mitya

20.08.2012 22:20, Warner Losh написал:

On Aug 20, 2012, at 1:17 PM, Wojciech Puchar wrote:


or use ++.

i think it is always aligned to 2 bytes and this should produce usable code on 
any CPU? should be 6 instructions on MIPS and PPC IMHO.

We should tag it as __aligned(2) then, no?  If so, then the compiler should 
generate the code you posted.

should is the most important word in Your post. what it actually do - i don't 
know.

If we are requiring this to be __aligned(2), we should tag it as such to 
enforce this.

Even without this tagging, the code to do a structure level copy of 6 bytes is 
going to be tiny...

Warner


I try some times different algorithms. This is one of thees:
*(u_int32_t *)(dst) = *(u_int32_t *)(src);
*(u_int16_t *)&(dst->octet[4]) = *(u_int16_t *)&(src->octet[4]);

But, internal gcc and clang optimisations are much better, than my attempt.
For aligned platforms (2 bytes aligned) best choice is *dst = *src;
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Replace bcopy() to update ether_addr

2012-08-21 Thread Mitya

21.08.2012 14:26, Marius Strobl написал:

On Mon, Aug 20, 2012 at 01:20:29PM -0600, Warner Losh wrote:

On Aug 20, 2012, at 1:17 PM, Wojciech Puchar wrote:


or use ++.

i think it is always aligned to 2 bytes and this should produce usable code on 
any CPU? should be 6 instructions on MIPS and PPC IMHO.

We should tag it as __aligned(2) then, no?  If so, then the compiler should 
generate the code you posted.

should is the most important word in Your post. what it actually do - i don't 
know.

If we are requiring this to be __aligned(2), we should tag it as such to 
enforce this.

Even without this tagging, the code to do a structure level copy of 6 bytes is 
going to be tiny...


While the __aligned(2) approach certainly works, I've actually rather
mixed experiences on x86 with it as the compiler doesn't necessarily
produce the small and efficient one would expect from code it. Such
a change certainly shouldn't be done just on the assumption that the
compiler has all hints required to produce good code from it but the
resulting asm should be verified across all affected architectures.

Marius

Yes. I totally agree. That is why I have limited use of this feature 
only i386 and amd64 architectures.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Replace bcopy() to update ether_addr

2012-08-21 Thread Mitya

22.08.2012 05:07, Bruce Evans написал:

On Mon, Aug 20, 2012 at 05:46:12PM +0300, Mitya wrote:

Hi.
I found some overhead code in /src/sys/net/if_ethersubr.c and
/src/sys/netgraph/ng_ether.c

It contains strings, like bcopy(src, dst, ETHER_ADDR_LEN);
When src and dst are "struct ether_addr*", and ETHER_ADDR_LEN equal 6.

Only ng_ether.c contains such strings.  if_ethersubr.c doesn't exist.
if_ether.c exists, but was converted to use memcpy() 17.25 years ago.


This code call every time, when we send Ethernet packet.
On example, on my machine in invoked nearly 20K per second.

Why we are use bcopy(), to copy only 6 bytes?
Answer - in some architectures we are can not directly copy unaligned data.

I propose this solution.

In file /usr/src/include/net/ethernet.h add this lines:

static inline void ether_addr_copy(ether_addr* src, ether_addr* dst) {
#if defined(__i386__) || defined(__amd64__)
 *dst = *src;
#else
 bcopy(src, dst, ETHER_ADDR_LEN);
#endif
}

On platform i386 gcc produce like this code:
 leal-30(%ebp), %eax
 leal6(%eax), %ecx
 leal-44(%ebp), %edx
 movl(%edx), %eax
 movl%eax, (%ecx)
 movzwl  4(%edx), %eax
 movw%ax, 4(%ecx)
And clang produce this:
 movl-48(%ebp), %ecx
 movl%ecx, -26(%ebp)
 movw-44(%ebp), %si
 movw%si, -22(%ebp)

All this variants are much faster, than bcopy()

You mean "as much as 5 nanoseconds faster".  Possibly even 10 nanoseconds
faster.


A bit orthogonal to this but also related to the performance
impact of these bcopy() calls, for !__NO_STRICT_ALIGNMENT
architectures these places probably should use memcpy()
instead as bcopy() additionally has to check for overlap
while the former does not. Overlaps unlikely are an issue
in these cases and at least NetBSD apparently has done the
switch to memcpy() 5.5 years ago.

This is essentially just a style bug.  FreeBSD switched to memcpy()
17.25 years ago for selected networking copying.  memcpy() is supposed
to be used if and only if compilers can optimize it.  This means that
the size must be fixed and small, and of course that the copies don't
overlap.  Otherwise, compilers can't do any better than call an extern
copying routine, which is memcpy() in practice.  memcpy() was
interntionally left out in FreeBSD until it was added 17.25 years
ago to satisfy the changes to use memcpy() in networking code (since
with -O0, memcpy() won't be inlined and the extern memcpy() gets
used).  Other uses are style bugs, but there are many now :-(.

bcopy() is still the primary interface, and might be faster than
memcpy(), especially for misaligned cases, but in practice it isn't,
except in the kernel on Pentium1 in 1996-1998 where I only optimized
(large) bcopy()s.  Since it has to support overlapping copies it is
inherently slower.

Although compilers can't do better for large copying than call an
extern routine, some compilers bogusly inline it to something like
"rep movsd" on i386, (or worse, to a very large number of loads and
stores).  gcc used to have a very large threshold for inlining
moderately-sized copies and/or for switching between "rep movsd" and
load/store.  It now understands better than ut doesn't understand
memory, and has more reasonable thresholds.   Or rather the thresholds
are more MD.  gcc still makes a mess with some CFLAGS:

% struct foo
% {
%   short x;
%   struct bar {
%   short yy[31];
%   } y;
% } s, t;
%
% foo()
% {
%   s.y = t.y;
% }

With just -O, gcc-4.2.1 -O on i386 handles this very badly, by
generating 15 misaligned 32-bit load/stores followed by 1 aligned
16-bit load/store.  With -march=, it generates 1
16-bit aligned load-store followed by an aligned "rep movsd" with a
count of 15.  The latter is not too bad.  Similarly for yy[32].  But
for yy[33] it switches to a call to memcpy() even with plain -O.

However, improvements in memory systems and branch prediction since
Pentium1 in 1996-98 mean that optimimizing copying mostly gets
nowhere.  Copying is either from the cache[s], in which case it is
fast (~1 nanosecond per 128 bits for L1), or it is not from the
caches in which case it is slow (~50-200 nanseconds per cache miss).
With 6-byte ethernet addresses, using bcopy() does slow the copying
to considerably below 1 nanosecond per 128 bits (a sloppy estimate
gives 5-10 ns/call), but it's hard for a single call to be made often
enough to make a significant difference.  Someone mentioned 2
calls.  That's the same as 0 calls: 2 * 10 nsec = 200 usec =
0.05% of 1 CPU.

If anyone cared about this, then they would use __builtin_memcpy()
instead of memcpy().  (Note that the point of the 17.25-year old
optimization has been defeated for ~10 years by turning off _all_
builtins, which was initially done mainly to kill builtin putc().
(-ffreestanding should have done that.)  So gcc inlines struct

Re: speed tests (Re: Replace bcopy() to update ether_addr)

2012-08-22 Thread Mitya

22.08.2012 17:36, Luigi Rizzo написал:

On Wed, Aug 22, 2012 at 02:32:21AM +, Bruce Evans wrote:

luigi wrote:


even more orthogonal:

I found that copying 8n + (5, 6 or 7) bytes was much much slower than
copying a multiple of 8 bytes. For n=0, 1,2,4,8 bytes are efficient,
other cases are slow (turned into 2 or 3 different writes).

The netmap code uses a pkt_copy routine that does exactly this
rounding, gaining some 10-20ns per packet for small sizes.

I don't believe 10-20ns for just the extra bytes.  memcpy() ends up
with a movsb to copy the extra bytes.  This can be slow, but I don't
believe 10-20ns (except on machines running at i486 speeds of course).

I am adding at the end a test program so people can try things on their hw.

Build it with

cc -O2 -Werror -Wall -Wextra  -lpthread -lrt testlock.c -o testlock




# uname -a
FreeBSD m18.cabletv.dp.ua 9.0-STABLE FreeBSD 9.0-STABLE #1: Tue Apr 24 
13:23:05 EEST 2012 r...@m18.cabletv.dp.ua:/usr/src/sys/i386/compile/m18 i386


cc -O2 -Werror -Wall -Wextra  -lpthread -lrt testlock.c -o testlock

testlock.c: In function 'test_rdtsc':
testlock.c:151: error: can't find a register in class 'AD_REGS' while 
reloading 'asm'

testlock.c:151: error: 'asm' operand has impossible constraints

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


IFF_RENAMING interface flag

2011-07-05 Thread Mitya

Where I can see IFF_RENAMING interface flag ?

/usr/include/net/if.h

[skipped...]
#define IFF_MONITOR 0x4 /* (n) user-requested monitor 
mode */

#define IFF_STATICARP   0x8 /* (n) static ARP */
#define IFF_DYING   0x20/* (n) interface is winding down */
#define IFF_RENAMING0x40/* (n) interface is being renamed */


/usr/src/sbin/ifconfig/ifconfig.c
[skipped...]
#define IFFBITS \
"\020\1UP\2BROADCAST\3DEBUG\4LOOPBACK\5POINTOPOINT\6SMART\7RUNNING" \
"\10NOARP\11PROMISC\12ALLMULTI\13OACTIVE\14SIMPLEX\15LINK0\16LINK1\17LINK2" 
\

"\20MULTICAST\22PPROMISC\23MONITOR\24STATICARP"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"