Re: zfs + uma

2010-09-19 Thread Andriy Gapon
on 19/09/2010 01:16 Jeff Roberson said the following:
> Not specifically in reaction to Robert's comment but I would like to add my
> thoughts to this notion of resource balancing in buckets.  I really prefer not
> to do any specific per-zone tuning except in extreme cases. This is because
> quite often the decisions we make don't apply to some class of machines or
> workloads.  I would instead prefer to keep the algorithm adaptable.

Agree.

> I like the idea of weighting the bucket decisions by the size of the item.
> Obviously this has some flaws with compound objects but in the general case it
> is good.  We should consider increasing the cost of bucket expansion based on
> the size of the item.  Right now buckets are expanded fairly readily.
> 
> We could also consider decreasing the default bucket size for a zone based on 
> vm
> pressure and use.  Right now there is no downward pressure on bucket size, 
> only
> upward based on trips to the slab layer.
> 
> Additionally we could make a last ditch flush mechanism that runs on each cpu 
> in
> turn and flushes some or all of the buckets in per-cpu caches. Presently that 
> is
> not done due to synchronization issues.  It can't be done from a central 
> place. 
> It could be done with a callout mechanism or a for loop that binds to each 
> core
> in succession.

I like all of the tree above approaches.
The last one is a bit hard to implement, the first two seem easier.

> I believe the combination of these approaches would significantly solve the
> problem and should be relatively little new code.  It should also preserve the
> adaptable nature of the system without penalizing resource heavy systems.  I
> would be happy to review patches from anyone who wishes to undertake it.

FWIW, the approach of simply limiting maximum bucket size based on item size
seems to work rather well too, as my testing with zfs+uma shows.
I will also try to add code to completely bypass the per-cpu cache for "really
huge" items.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [PATCH] Add -lssp_nonshared to GCC's LIB_SPEC unconditionally

2010-09-19 Thread Jeremie Le Hen
Hi Alexander,

On Thu, Aug 05, 2010 at 09:14:46PM +0200, Jeremie Le Hen wrote:
> On Tue, Aug 03, 2010 at 11:46:51AM -0400, Alexander Kabaev wrote:
> >
> > I have no objection, but think we should cave in and investigate the
> > possibility of using linker script wrapping libc.so in FreeBSD-9.0:
> > 
> > Below is Linux' counterpart:
> > 
> > /* GNU ld script
> >Use the shared library, but some functions are only in
> >the static library, so try that secondarily.  */
> > OUTPUT_FORMAT(elf32-i386)
> > GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a  AS_NEEDED
> > ( /lib/ld-linux.so.2 ) )
> 
> Ok.  For now can you commit the proposed modification.  I'll try to make
> a patch with your proposal.

The attached patch does two things: It modifies bsd.lib.mk to support ld
scripts for shared libraries and adds such a script to replace the
/usr/lib/libc.so symlink to /lib/libc.so.X.

Basically, SHLIB_LDSCRIPT is defined in lib/libc/Makefile and points to
the file containing the script itself:
GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a )

During make install, @@SHLIB@@ will be replaced by the real path of the
shared library.

Thanks.
Regards,
-- 
Jeremie Le Hen

Humans are born free and equal.  But some are more equal than others.
Coluche
diff -urNp src.orig/Makefile.inc1 src/Makefile.inc1
--- src.orig/Makefile.inc1	2010-07-15 13:21:25.0 +
+++ src/Makefile.inc1	2010-08-19 17:27:30.0 +
@@ -256,6 +256,7 @@ WMAKEENV=	${CROSSENV} \
 		_SHLIBDIRPREFIX=${WORLDTMP} \
 		VERSION="${VERSION}" \
 		INSTALL="sh ${.CURDIR}/tools/install.sh" \
+		NO_LDSCRIPT_INSTALL=1 \
 		PATH=${TMPPATH}
 .if ${MK_CDDL} == "no"
 WMAKEENV+=	NO_CTF=1
diff -urNp src.orig/lib/libc/Makefile src/lib/libc/Makefile
--- src.orig/lib/libc/Makefile	2010-08-01 12:35:01.0 +
+++ src/lib/libc/Makefile	2010-08-11 17:36:15.0 +
@@ -20,6 +20,7 @@ CFLAGS+=-DNLS
 CLEANFILES+=tags
 INSTALL_PIC_ARCHIVE=
 PRECIOUSLIB=
+SHLIB_LDSCRIPT=libc.ldscript
 
 #
 # Only link with static libgcc.a (no libgcc_eh.a).
diff -urNp src.orig/lib/libc/libc.ldscript src/lib/libc/libc.ldscript
--- src.orig/lib/libc/libc.ldscript	1970-01-01 00:00:00.0 +
+++ src/lib/libc/libc.ldscript	2010-08-09 11:12:13.0 +
@@ -0,0 +1 @@
+GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a )
diff -urNp src.orig/share/mk/bsd.lib.mk src/share/mk/bsd.lib.mk
--- src.orig/share/mk/bsd.lib.mk	2010-07-30 15:25:57.0 +
+++ src/share/mk/bsd.lib.mk	2010-08-22 13:00:15.0 +
@@ -216,6 +216,14 @@ ${SHLIB_NAME}: ${SOBJS}
 	@[ -z "${CTFMERGE}" -o -n "${NO_CTF}" ] || \
 		(${ECHO} ${CTFMERGE} ${CTFFLAGS} -o ${.TARGET} ${SOBJS} && \
 		${CTFMERGE} ${CTFFLAGS} -o ${.TARGET} ${SOBJS})
+
+.if defined(SHLIB_LINK) && defined(SHLIB_LDSCRIPT) && !empty(SHLIB_LDSCRIPT) && exists(${.CURDIR}/${SHLIB_LDSCRIPT})
+_LIBS+= lib${LIB}.ld
+
+lib${LIB}.ld: ${.CURDIR}/${SHLIB_LDSCRIPT}
+	sed 's,@@SHLIB@@,${SHLIBDIR}/${SHLIB_NAME},g' \
+	${.CURDIR}/${SHLIB_LDSCRIPT} > lib${LIB}.ld
+.endif
 .endif
 
 .if defined(INSTALL_PIC_ARCHIVE) && defined(LIB) && !empty(LIB) && ${MK_TOOLCHAIN} != "no"
@@ -293,9 +301,17 @@ _libinstall:
 	${_INSTALLFLAGS} ${_SHLINSTALLFLAGS} \
 	${SHLIB_NAME} ${DESTDIR}${SHLIBDIR}
 .if defined(SHLIB_LINK)
+.if defined(SHLIB_LDSCRIPT) && !empty(SHLIB_LDSCRIPT) && exists(${.CURDIR}/${SHLIB_LDSCRIPT}) && empty(NO_LDSCRIPT_INSTALL)
+	@echo "DEBUG: install lib${LIB}.ld to ${DESTDIR}${LIBDIR}/${SHLIB_LINK}"
+	${INSTALL} -S -C -o ${LIBOWN} -g ${LIBGRP} -m ${LIBMODE} \
+	${_INSTALLFLAGS} lib${LIB}.ld ${DESTDIR}${LIBDIR}
+	ln -fs lib${LIB}.ld ${DESTDIR}${LIBDIR}/${SHLIB_LINK}
+.else
 .if ${SHLIBDIR} == ${LIBDIR}
+	@echo "DEBUG: symlink (1) ${DESTDIR}${LIBDIR}/${SHLIB_LINK} to ${SHLIB_NAME}"
 	ln -fs ${SHLIB_NAME} ${DESTDIR}${LIBDIR}/${SHLIB_LINK}
 .else
+	@echo "DEBUG: symlink (2) ${DESTDIR}${LIBDIR}/${SHLIB_LINK} to ${_SHLIBDIRPREFIX}${SHLIBDIR}/${SHLIB_NAME}"
 	ln -fs ${_SHLIBDIRPREFIX}${SHLIBDIR}/${SHLIB_NAME} \
 	${DESTDIR}${LIBDIR}/${SHLIB_LINK}
 .if exists(${DESTDIR}${LIBDIR}/${SHLIB_NAME})
@@ -303,8 +319,9 @@ _libinstall:
 	rm -f ${DESTDIR}${LIBDIR}/${SHLIB_NAME}
 .endif
 .endif
-.endif
-.endif
+.endif # SHLIB_LDSCRIPT
+.endif # SHLIB_LINK
+.endif # SHIB_NAME
 .if defined(INSTALL_PIC_ARCHIVE) && defined(LIB) && !empty(LIB) && ${MK_TOOLCHAIN} != "no"
 	${INSTALL} -o ${LIBOWN} -g ${LIBGRP} -m ${LIBMODE} \
 	${_INSTALLFLAGS} lib${LIB}_pic.a ${DESTDIR}${LIBDIR}
@@ -372,6 +389,9 @@ clean:
 .endif
 .if defined(SHLIB_NAME)
 .if defined(SHLIB_LINK)
+.if defined(SHLIB_LDSCRIPT) && exists(${.CURDIR}/${SHLIB_LDSCRIPT})
+	rm -f lib${LIB}.ld
+.endif
 	rm -f ${SHLIB_LINK}
 .endif
 .if defined(LIB) && !empty(LIB)
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: zfs + uma

2010-09-19 Thread Andriy Gapon
on 19/09/2010 11:27 Jeff Roberson said the following:
> I don't like this because even with very large buffers you can still have high
> enough turnover to require per-cpu caching.  Kip specifically added UMA 
> support
> to address this issue in zfs.  If you have allocations which don't require
> per-cpu caching and are very large why even use UMA?

Good point.
Right now I am running with 4 items/bucket limit for items larger than 32KB.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: zfs + uma

2010-09-19 Thread Jeff Roberson

On Sun, 19 Sep 2010, Andriy Gapon wrote:


on 19/09/2010 01:16 Jeff Roberson said the following:

Not specifically in reaction to Robert's comment but I would like to add my
thoughts to this notion of resource balancing in buckets.  I really prefer not
to do any specific per-zone tuning except in extreme cases. This is because
quite often the decisions we make don't apply to some class of machines or
workloads.  I would instead prefer to keep the algorithm adaptable.


Agree.


I like the idea of weighting the bucket decisions by the size of the item.
Obviously this has some flaws with compound objects but in the general case it
is good.  We should consider increasing the cost of bucket expansion based on
the size of the item.  Right now buckets are expanded fairly readily.

We could also consider decreasing the default bucket size for a zone based on vm
pressure and use.  Right now there is no downward pressure on bucket size, only
upward based on trips to the slab layer.

Additionally we could make a last ditch flush mechanism that runs on each cpu in
turn and flushes some or all of the buckets in per-cpu caches. Presently that is
not done due to synchronization issues.  It can't be done from a central place.
It could be done with a callout mechanism or a for loop that binds to each core
in succession.


I like all of the tree above approaches.
The last one is a bit hard to implement, the first two seem easier.


All the last one requires is a loop calling sched_bind() on each available 
cpu.





I believe the combination of these approaches would significantly solve the
problem and should be relatively little new code.  It should also preserve the
adaptable nature of the system without penalizing resource heavy systems.  I
would be happy to review patches from anyone who wishes to undertake it.


FWIW, the approach of simply limiting maximum bucket size based on item size
seems to work rather well too, as my testing with zfs+uma shows.
I will also try to add code to completely bypass the per-cpu cache for "really
huge" items.


I don't like this because even with very large buffers you can still have 
high enough turnover to require per-cpu caching.  Kip specifically added 
UMA support to address this issue in zfs.  If you have allocations which 
don't require per-cpu caching and are very large why even use UMA?


One thing that would be nice if we are frequently using page size 
allocations is to eliminate the requirement for a slab header for each 
page.  It seems unnecessary for any zone where the items per slab is 1 but 
it would require careful modification to support properly.


Thanks,
Jeff



--
Andriy Gapon


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: zfs + uma

2010-09-19 Thread Robert N. M. Watson

On 19 Sep 2010, at 09:21, Andriy Gapon wrote:

>> I believe the combination of these approaches would significantly solve the
>> problem and should be relatively little new code.  It should also preserve 
>> the
>> adaptable nature of the system without penalizing resource heavy systems.  I
>> would be happy to review patches from anyone who wishes to undertake it.
> 
> FWIW, the approach of simply limiting maximum bucket size based on item size
> seems to work rather well too, as my testing with zfs+uma shows.
> I will also try to add code to completely bypass the per-cpu cache for "really
> huge" items.

This is basically what malloc(9) does already: for small items, it allocates 
from a series of fixed-size buckets (which could probably use tuning), but 
maintains its own stats with respect to the types it maps into the buckets. 
This is why there's double-counting between vmstat -z and vmstat -m, since the 
former shows the buckets used to allocate the latter.

For large items, malloc(9) goes through UMA, but it's basically a pass-through 
to VM, which directly provides pages. This means that for small malloc types, 
you get per-CPU caches, and for large malloc types, you don't.

malloc(9) doesn't require fixed-size allocations, but also can't provide the 
ctor/dtor partial tear-down caching, nor different effective working sets of 
memory for different types.

UMA should really only be used directly for memory types where tight packing, 
per-CPU caching, and possibly partial tear-down, have benefits. mbufs are a 
great example, because we allocate tons and tons of them continuously in 
operation. More stable types allocated in smaller quantities make very little 
sense, since we waste lots of memory overhead in allocating buckets that won't 
be used, etc.

Robert___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: zfs + uma

2010-09-19 Thread Robert N. M. Watson

On 19 Sep 2010, at 09:42, Andriy Gapon wrote:

> on 19/09/2010 11:27 Jeff Roberson said the following:
>> I don't like this because even with very large buffers you can still have 
>> high
>> enough turnover to require per-cpu caching.  Kip specifically added UMA 
>> support
>> to address this issue in zfs.  If you have allocations which don't require
>> per-cpu caching and are very large why even use UMA?
> 
> Good point.
> Right now I am running with 4 items/bucket limit for items larger than 32KB.

If allocate turnover is low, I'd think that malloc(9) would do better here. How 
many allocs/frees per second are there in peak operation?

Robert___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ar(1) format_decimal failure is fatal?

2010-09-19 Thread Joerg Sonnenberger
On Sat, Sep 18, 2010 at 12:01:04AM -0400, Benjamin Kaduk wrote:
> GNU binutils has recently (well, March 2009) added a -D
> ("deterministic") argument to ar(1) which sets the timestamp, uid,
> and gid to zero, and the mode to 644.

That argument was added based on discussions on NetBSD about doing
bit-identical release builds. It was made optional for the possible
users of the data, not that we are really aware of anyone using it.
The ar(1) support in make basically goes back to a time when replacing
the content was a major speed up for incremental builds and it is pretty
much useless nowadays. Similary the timestamp, it doesn't tell that much
about the content either.

I don't think the backend should do silent truncation, that would be
very bad. It might be needed to have a flag for backends to allow it
though.

Joerg
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [PATCH] Add -lssp_nonshared to GCC's LIB_SPEC unconditionally

2010-09-19 Thread Kostik Belousov
On Sun, Sep 19, 2010 at 10:14:06AM +0200, Jeremie Le Hen wrote:
> Hi Alexander,
> 
> On Thu, Aug 05, 2010 at 09:14:46PM +0200, Jeremie Le Hen wrote:
> > On Tue, Aug 03, 2010 at 11:46:51AM -0400, Alexander Kabaev wrote:
> > >
> > > I have no objection, but think we should cave in and investigate the
> > > possibility of using linker script wrapping libc.so in FreeBSD-9.0:
> > > 
> > > Below is Linux' counterpart:
> > > 
> > > /* GNU ld script
> > >Use the shared library, but some functions are only in
> > >the static library, so try that secondarily.  */
> > > OUTPUT_FORMAT(elf32-i386)
> > > GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a  AS_NEEDED
> > > ( /lib/ld-linux.so.2 ) )
> > 
> > Ok.  For now can you commit the proposed modification.  I'll try to make
> > a patch with your proposal.
> 
> The attached patch does two things: It modifies bsd.lib.mk to support ld
> scripts for shared libraries and adds such a script to replace the
> /usr/lib/libc.so symlink to /lib/libc.so.X.
> 
> Basically, SHLIB_LDSCRIPT is defined in lib/libc/Makefile and points to
> the file containing the script itself:
> GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a )
> 
> During make install, @@SHLIB@@ will be replaced by the real path of the
> shared library.

You did not included $FreeBSD$ tag into libc.so script. I think it would be
useful to have.

Could you, please, comment why the script is not installed during the
world build stage ? My question is, would the buildworld use the script
for linkage ?


pgpxiw2IJJLfu.pgp
Description: PGP signature