Re: zfs + uma
on 19/09/2010 01:16 Jeff Roberson said the following: > Not specifically in reaction to Robert's comment but I would like to add my > thoughts to this notion of resource balancing in buckets. I really prefer not > to do any specific per-zone tuning except in extreme cases. This is because > quite often the decisions we make don't apply to some class of machines or > workloads. I would instead prefer to keep the algorithm adaptable. Agree. > I like the idea of weighting the bucket decisions by the size of the item. > Obviously this has some flaws with compound objects but in the general case it > is good. We should consider increasing the cost of bucket expansion based on > the size of the item. Right now buckets are expanded fairly readily. > > We could also consider decreasing the default bucket size for a zone based on > vm > pressure and use. Right now there is no downward pressure on bucket size, > only > upward based on trips to the slab layer. > > Additionally we could make a last ditch flush mechanism that runs on each cpu > in > turn and flushes some or all of the buckets in per-cpu caches. Presently that > is > not done due to synchronization issues. It can't be done from a central > place. > It could be done with a callout mechanism or a for loop that binds to each > core > in succession. I like all of the tree above approaches. The last one is a bit hard to implement, the first two seem easier. > I believe the combination of these approaches would significantly solve the > problem and should be relatively little new code. It should also preserve the > adaptable nature of the system without penalizing resource heavy systems. I > would be happy to review patches from anyone who wishes to undertake it. FWIW, the approach of simply limiting maximum bucket size based on item size seems to work rather well too, as my testing with zfs+uma shows. I will also try to add code to completely bypass the per-cpu cache for "really huge" items. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [PATCH] Add -lssp_nonshared to GCC's LIB_SPEC unconditionally
Hi Alexander, On Thu, Aug 05, 2010 at 09:14:46PM +0200, Jeremie Le Hen wrote: > On Tue, Aug 03, 2010 at 11:46:51AM -0400, Alexander Kabaev wrote: > > > > I have no objection, but think we should cave in and investigate the > > possibility of using linker script wrapping libc.so in FreeBSD-9.0: > > > > Below is Linux' counterpart: > > > > /* GNU ld script > >Use the shared library, but some functions are only in > >the static library, so try that secondarily. */ > > OUTPUT_FORMAT(elf32-i386) > > GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a AS_NEEDED > > ( /lib/ld-linux.so.2 ) ) > > Ok. For now can you commit the proposed modification. I'll try to make > a patch with your proposal. The attached patch does two things: It modifies bsd.lib.mk to support ld scripts for shared libraries and adds such a script to replace the /usr/lib/libc.so symlink to /lib/libc.so.X. Basically, SHLIB_LDSCRIPT is defined in lib/libc/Makefile and points to the file containing the script itself: GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a ) During make install, @@SHLIB@@ will be replaced by the real path of the shared library. Thanks. Regards, -- Jeremie Le Hen Humans are born free and equal. But some are more equal than others. Coluche diff -urNp src.orig/Makefile.inc1 src/Makefile.inc1 --- src.orig/Makefile.inc1 2010-07-15 13:21:25.0 + +++ src/Makefile.inc1 2010-08-19 17:27:30.0 + @@ -256,6 +256,7 @@ WMAKEENV= ${CROSSENV} \ _SHLIBDIRPREFIX=${WORLDTMP} \ VERSION="${VERSION}" \ INSTALL="sh ${.CURDIR}/tools/install.sh" \ + NO_LDSCRIPT_INSTALL=1 \ PATH=${TMPPATH} .if ${MK_CDDL} == "no" WMAKEENV+= NO_CTF=1 diff -urNp src.orig/lib/libc/Makefile src/lib/libc/Makefile --- src.orig/lib/libc/Makefile 2010-08-01 12:35:01.0 + +++ src/lib/libc/Makefile 2010-08-11 17:36:15.0 + @@ -20,6 +20,7 @@ CFLAGS+=-DNLS CLEANFILES+=tags INSTALL_PIC_ARCHIVE= PRECIOUSLIB= +SHLIB_LDSCRIPT=libc.ldscript # # Only link with static libgcc.a (no libgcc_eh.a). diff -urNp src.orig/lib/libc/libc.ldscript src/lib/libc/libc.ldscript --- src.orig/lib/libc/libc.ldscript 1970-01-01 00:00:00.0 + +++ src/lib/libc/libc.ldscript 2010-08-09 11:12:13.0 + @@ -0,0 +1 @@ +GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a ) diff -urNp src.orig/share/mk/bsd.lib.mk src/share/mk/bsd.lib.mk --- src.orig/share/mk/bsd.lib.mk 2010-07-30 15:25:57.0 + +++ src/share/mk/bsd.lib.mk 2010-08-22 13:00:15.0 + @@ -216,6 +216,14 @@ ${SHLIB_NAME}: ${SOBJS} @[ -z "${CTFMERGE}" -o -n "${NO_CTF}" ] || \ (${ECHO} ${CTFMERGE} ${CTFFLAGS} -o ${.TARGET} ${SOBJS} && \ ${CTFMERGE} ${CTFFLAGS} -o ${.TARGET} ${SOBJS}) + +.if defined(SHLIB_LINK) && defined(SHLIB_LDSCRIPT) && !empty(SHLIB_LDSCRIPT) && exists(${.CURDIR}/${SHLIB_LDSCRIPT}) +_LIBS+= lib${LIB}.ld + +lib${LIB}.ld: ${.CURDIR}/${SHLIB_LDSCRIPT} + sed 's,@@SHLIB@@,${SHLIBDIR}/${SHLIB_NAME},g' \ + ${.CURDIR}/${SHLIB_LDSCRIPT} > lib${LIB}.ld +.endif .endif .if defined(INSTALL_PIC_ARCHIVE) && defined(LIB) && !empty(LIB) && ${MK_TOOLCHAIN} != "no" @@ -293,9 +301,17 @@ _libinstall: ${_INSTALLFLAGS} ${_SHLINSTALLFLAGS} \ ${SHLIB_NAME} ${DESTDIR}${SHLIBDIR} .if defined(SHLIB_LINK) +.if defined(SHLIB_LDSCRIPT) && !empty(SHLIB_LDSCRIPT) && exists(${.CURDIR}/${SHLIB_LDSCRIPT}) && empty(NO_LDSCRIPT_INSTALL) + @echo "DEBUG: install lib${LIB}.ld to ${DESTDIR}${LIBDIR}/${SHLIB_LINK}" + ${INSTALL} -S -C -o ${LIBOWN} -g ${LIBGRP} -m ${LIBMODE} \ + ${_INSTALLFLAGS} lib${LIB}.ld ${DESTDIR}${LIBDIR} + ln -fs lib${LIB}.ld ${DESTDIR}${LIBDIR}/${SHLIB_LINK} +.else .if ${SHLIBDIR} == ${LIBDIR} + @echo "DEBUG: symlink (1) ${DESTDIR}${LIBDIR}/${SHLIB_LINK} to ${SHLIB_NAME}" ln -fs ${SHLIB_NAME} ${DESTDIR}${LIBDIR}/${SHLIB_LINK} .else + @echo "DEBUG: symlink (2) ${DESTDIR}${LIBDIR}/${SHLIB_LINK} to ${_SHLIBDIRPREFIX}${SHLIBDIR}/${SHLIB_NAME}" ln -fs ${_SHLIBDIRPREFIX}${SHLIBDIR}/${SHLIB_NAME} \ ${DESTDIR}${LIBDIR}/${SHLIB_LINK} .if exists(${DESTDIR}${LIBDIR}/${SHLIB_NAME}) @@ -303,8 +319,9 @@ _libinstall: rm -f ${DESTDIR}${LIBDIR}/${SHLIB_NAME} .endif .endif -.endif -.endif +.endif # SHLIB_LDSCRIPT +.endif # SHLIB_LINK +.endif # SHIB_NAME .if defined(INSTALL_PIC_ARCHIVE) && defined(LIB) && !empty(LIB) && ${MK_TOOLCHAIN} != "no" ${INSTALL} -o ${LIBOWN} -g ${LIBGRP} -m ${LIBMODE} \ ${_INSTALLFLAGS} lib${LIB}_pic.a ${DESTDIR}${LIBDIR} @@ -372,6 +389,9 @@ clean: .endif .if defined(SHLIB_NAME) .if defined(SHLIB_LINK) +.if defined(SHLIB_LDSCRIPT) && exists(${.CURDIR}/${SHLIB_LDSCRIPT}) + rm -f lib${LIB}.ld +.endif rm -f ${SHLIB_LINK} .endif .if defined(LIB) && !empty(LIB) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: zfs + uma
on 19/09/2010 11:27 Jeff Roberson said the following: > I don't like this because even with very large buffers you can still have high > enough turnover to require per-cpu caching. Kip specifically added UMA > support > to address this issue in zfs. If you have allocations which don't require > per-cpu caching and are very large why even use UMA? Good point. Right now I am running with 4 items/bucket limit for items larger than 32KB. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: zfs + uma
On Sun, 19 Sep 2010, Andriy Gapon wrote: on 19/09/2010 01:16 Jeff Roberson said the following: Not specifically in reaction to Robert's comment but I would like to add my thoughts to this notion of resource balancing in buckets. I really prefer not to do any specific per-zone tuning except in extreme cases. This is because quite often the decisions we make don't apply to some class of machines or workloads. I would instead prefer to keep the algorithm adaptable. Agree. I like the idea of weighting the bucket decisions by the size of the item. Obviously this has some flaws with compound objects but in the general case it is good. We should consider increasing the cost of bucket expansion based on the size of the item. Right now buckets are expanded fairly readily. We could also consider decreasing the default bucket size for a zone based on vm pressure and use. Right now there is no downward pressure on bucket size, only upward based on trips to the slab layer. Additionally we could make a last ditch flush mechanism that runs on each cpu in turn and flushes some or all of the buckets in per-cpu caches. Presently that is not done due to synchronization issues. It can't be done from a central place. It could be done with a callout mechanism or a for loop that binds to each core in succession. I like all of the tree above approaches. The last one is a bit hard to implement, the first two seem easier. All the last one requires is a loop calling sched_bind() on each available cpu. I believe the combination of these approaches would significantly solve the problem and should be relatively little new code. It should also preserve the adaptable nature of the system without penalizing resource heavy systems. I would be happy to review patches from anyone who wishes to undertake it. FWIW, the approach of simply limiting maximum bucket size based on item size seems to work rather well too, as my testing with zfs+uma shows. I will also try to add code to completely bypass the per-cpu cache for "really huge" items. I don't like this because even with very large buffers you can still have high enough turnover to require per-cpu caching. Kip specifically added UMA support to address this issue in zfs. If you have allocations which don't require per-cpu caching and are very large why even use UMA? One thing that would be nice if we are frequently using page size allocations is to eliminate the requirement for a slab header for each page. It seems unnecessary for any zone where the items per slab is 1 but it would require careful modification to support properly. Thanks, Jeff -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: zfs + uma
On 19 Sep 2010, at 09:21, Andriy Gapon wrote: >> I believe the combination of these approaches would significantly solve the >> problem and should be relatively little new code. It should also preserve >> the >> adaptable nature of the system without penalizing resource heavy systems. I >> would be happy to review patches from anyone who wishes to undertake it. > > FWIW, the approach of simply limiting maximum bucket size based on item size > seems to work rather well too, as my testing with zfs+uma shows. > I will also try to add code to completely bypass the per-cpu cache for "really > huge" items. This is basically what malloc(9) does already: for small items, it allocates from a series of fixed-size buckets (which could probably use tuning), but maintains its own stats with respect to the types it maps into the buckets. This is why there's double-counting between vmstat -z and vmstat -m, since the former shows the buckets used to allocate the latter. For large items, malloc(9) goes through UMA, but it's basically a pass-through to VM, which directly provides pages. This means that for small malloc types, you get per-CPU caches, and for large malloc types, you don't. malloc(9) doesn't require fixed-size allocations, but also can't provide the ctor/dtor partial tear-down caching, nor different effective working sets of memory for different types. UMA should really only be used directly for memory types where tight packing, per-CPU caching, and possibly partial tear-down, have benefits. mbufs are a great example, because we allocate tons and tons of them continuously in operation. More stable types allocated in smaller quantities make very little sense, since we waste lots of memory overhead in allocating buckets that won't be used, etc. Robert___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: zfs + uma
On 19 Sep 2010, at 09:42, Andriy Gapon wrote: > on 19/09/2010 11:27 Jeff Roberson said the following: >> I don't like this because even with very large buffers you can still have >> high >> enough turnover to require per-cpu caching. Kip specifically added UMA >> support >> to address this issue in zfs. If you have allocations which don't require >> per-cpu caching and are very large why even use UMA? > > Good point. > Right now I am running with 4 items/bucket limit for items larger than 32KB. If allocate turnover is low, I'd think that malloc(9) would do better here. How many allocs/frees per second are there in peak operation? Robert___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ar(1) format_decimal failure is fatal?
On Sat, Sep 18, 2010 at 12:01:04AM -0400, Benjamin Kaduk wrote: > GNU binutils has recently (well, March 2009) added a -D > ("deterministic") argument to ar(1) which sets the timestamp, uid, > and gid to zero, and the mode to 644. That argument was added based on discussions on NetBSD about doing bit-identical release builds. It was made optional for the possible users of the data, not that we are really aware of anyone using it. The ar(1) support in make basically goes back to a time when replacing the content was a major speed up for incremental builds and it is pretty much useless nowadays. Similary the timestamp, it doesn't tell that much about the content either. I don't think the backend should do silent truncation, that would be very bad. It might be needed to have a flag for backends to allow it though. Joerg ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [PATCH] Add -lssp_nonshared to GCC's LIB_SPEC unconditionally
On Sun, Sep 19, 2010 at 10:14:06AM +0200, Jeremie Le Hen wrote: > Hi Alexander, > > On Thu, Aug 05, 2010 at 09:14:46PM +0200, Jeremie Le Hen wrote: > > On Tue, Aug 03, 2010 at 11:46:51AM -0400, Alexander Kabaev wrote: > > > > > > I have no objection, but think we should cave in and investigate the > > > possibility of using linker script wrapping libc.so in FreeBSD-9.0: > > > > > > Below is Linux' counterpart: > > > > > > /* GNU ld script > > >Use the shared library, but some functions are only in > > >the static library, so try that secondarily. */ > > > OUTPUT_FORMAT(elf32-i386) > > > GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a AS_NEEDED > > > ( /lib/ld-linux.so.2 ) ) > > > > Ok. For now can you commit the proposed modification. I'll try to make > > a patch with your proposal. > > The attached patch does two things: It modifies bsd.lib.mk to support ld > scripts for shared libraries and adds such a script to replace the > /usr/lib/libc.so symlink to /lib/libc.so.X. > > Basically, SHLIB_LDSCRIPT is defined in lib/libc/Makefile and points to > the file containing the script itself: > GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a ) > > During make install, @@SHLIB@@ will be replaced by the real path of the > shared library. You did not included $FreeBSD$ tag into libc.so script. I think it would be useful to have. Could you, please, comment why the script is not installed during the world build stage ? My question is, would the buildworld use the script for linkage ? pgpxiw2IJJLfu.pgp Description: PGP signature