date:20121110

Re: Memory reserves or lack thereof

2012-11-10 Thread Konstantin Belousov

On Fri, Nov 09, 2012 at 07:10:04PM +, Sears, Steven wrote:
> I have a memory subsystem design question that I'm hoping someone can answer.
> 
> I've been looking at a machine that is completely out of memory, as in
> 
>  v_free_count = 0, 
>  v_cache_count = 0, 
> 
> I wondered how a machine could completely run out of memory like this, 
> especially after finding a lack of interrupt storms or other pathologies that 
> would tend to overcommit memory. So I started investigating.
> 
> Most allocators come down to vm_page_alloc(), which has this guard:
> 
>   if ((curproc == pageproc) && (page_req != VM_ALLOC_INTERRUPT)) {
>   page_req = VM_ALLOC_SYSTEM;
>   };
> 
>   if (cnt.v_free_count + cnt.v_cache_count > cnt.v_free_reserved ||
>   (page_req == VM_ALLOC_SYSTEM && 
>   cnt.v_free_count + cnt.v_cache_count > cnt.v_interrupt_free_min) ||
>   (page_req == VM_ALLOC_INTERRUPT &&
>   cnt.v_free_count + cnt.v_cache_count > 0)) {
> 
> The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate every 
> last page.
> 
> >From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare, 
> >perhaps only used from interrupt threads. Not so, see kmem_malloc() or 
> >uma_small_alloc() which both contain this mapping:
> 
>   if ((flags & (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
>   pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
>   else
>   pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
> 
> Note that M_USE_RESERVE has been deprecated and is used in just a handful of 
> places. Also note that lots of code paths come through these routines.
> 
> What this means is essentially _any_ allocation using M_NOWAIT will bypass 
> whatever reserves have been held back and will take every last page available.
> 
> There is no documentation stating M_NOWAIT has this side effect of 
> essentially being privileged, so any innocuous piece of code that can't block 
> will use it. And of course M_NOWAIT is literally used all over.
> 
> It looks to me like the design goal of the BSD allocators is on recovery; it 
> will give all pages away knowing it can recover.
> 
> Am I missing anything? I would have expected some small number of pages to be 
> held in reserve just in case. And I didn't expect M_NOWAIT to be a sort of 
> back door for grabbing memory.
> 

Your analysis is right, there is nothing to add or correct.
This is the reason to strongly prefer M_WAITOK.


pgpXUAix5bcxa.pgp
Description: PGP signature

Re: watchdogd, jemalloc, and mlockall

2012-11-10 Thread Ian Lepore

On Sat, 2012-11-03 at 12:50 -0600, Ian Lepore wrote:
> On Sat, 2012-11-03 at 20:41 +0200, Konstantin Belousov wrote:
> > On Sat, Nov 03, 2012 at 12:38:39PM -0600, Ian Lepore wrote:
> > > In an attempt to un-hijack the thread about memory usage increase
> > > between 6.4 and 9.x, I'm starting a new thread here related to my recent
> > > discovery that watchdogd uses a lot more memory since it began using
> > > mlockall(2).
> > > 
> > > I tried statically linking watchdogd and it made a small difference in
> > > RSS, presumably because it doesn't wire down all of libc and libm.
> > > 
> > >  VSZ   RSS
> > > 10236 10164  Dynamic
> > >  8624  8636  Static
> > > 
> > > Those numbers are from ps -u on an arm platform.  I just updated the PR
> > > (bin/173332) with some procstat -v output comparing with/without
> > > mlockall().
> > > 
> > > It appears that the bulk of the new RSS bloat comes from jemalloc
> > > allocating vmspace in 8MB chunks.  With mlockall(MCL_FUTURE) in effect
> > > that leads to wiring 8MB to satisfy what probably amounts to a few
> > > hundred bytes of malloc'd memory.
> > > 
> > > It would probably also be a good idea to remove the floating point from
> > > watchdogd to avoid wiring all of libm.  The floating point is used just
> > > to turn the timeout-in-seconds into a power-of-two-nanoseconds value.
> > > There's probably a reasonably efficient way to do that without calling
> > > log(), considering that it only happens once at program startup.
> > 
> > No, I propose to add a switch to turn on/off the mlockall() call.
> > I have no opinion on the default value of the suggested switch.
> 
> In a patch I submitted along with the PR, I added code to query the
> vm.swap_enabled sysctl and only call mlockall() when swapping is
> enabled.  
> 
> Nobody yet has said anything about what seems to me to be the real
> problem here:  jemalloc grabs 8MB at a time even if you only need to
> malloc a few bytes, and there appears to be no way to control that
> behavior.  Or maybe there's a knob in there that didn't jump out at me
> on a quick glance through the header files.

I finally found some time to pursue this further.  A small correction to
what I said earlier: it appears that jemalloc allocates chunks of 4MB at
a time, not 8, but it also appears that it allocates at least 2 chunks
so the net effect is an 8MB default minimum allocation.

I played with the jemalloc tuning option lg_chunk and with static versus
dynamic linking, and came up with the numbers below, which were
generated by ps -u on an ARM-based system with 64MB running -current
from a couple weeks ago, but with the recent patch to watchdogd to
eliminate the need for libm.  I used "lg_chunk:14" (16K chunks), the
smallest value it would allow on this platform.  For comparison I also
include the numbers from a FreeBSD 8.2 ARM system (which would be
dynamic linked and untuned, and also without any mlockall() calls).

 Link malloc%MEMVSZ  RSS
-
dynamic   untuned15.3  10040 9996
staticuntuned13.2   8624 8636
dynamic   tuned   2.8   1880 1836
statictuned   0.8480  492

[ freebsd 8.2 ]   1.1   1752  748

So it appears that using jemalloc's tuning in a daemon that uses
mlockall(2) is a big win, especially if the daemon doesn't do much
memory allocation (watchdogd allocates 2 things, 4k and 1280 bytes; if
you use -e it also strdup()s the command string).  It also seems that
providing a build-time knob to control static linking would be valuable
on platforms that are very memory limited and can't benefit from having
all of libc wired.

I haven't attached a patch because there appears to be no good way to
actually achieve this in a platform-agnostic way.  The jemalloc code
enforces the lower range of the lg_chunk tuning value to be tied to the
page size of the platform, and it rejects out of range values without
changing the tuning.  The code that works on an ARM with 4K page size,

const char *malloc_conf = "lg_chunk:14";

would fail on a system that had bigger pages.  The tuning must be
specified with a compile-time constant like that, because it has to be
tuned before the first allocation, which apparently happens before
main() is entered.  It would be nice if jemalloc would clip the tuning
to the lowest legal value instead of rejecting it, especially since the
lowest legal value is calculated based not only on page size but on the
value of other configurable values.

There's another potential solution, but it strikes me as rather
inelegant... jemalloc can also be tuned with the MALLOC_CONF env var.
With the right rc-fu we could provide something like a watchdogd_memtune
variable that you could set and watchdogd would be invoked with
MALLOC_CONF set to that in the environment.  It still couldn't be set to
a default value that was good for all platforms.  It would also get
passed through environme

Re: FreeBSD on RaspberryPi

2012-11-10 Thread Sami Halabi

?

On Sat, Nov 10, 2012 at 12:32 AM, Sami Halabi  wrote:

> Hi,
> are there any plans to do images that support the hdmi output ? what about
> analog sound vs the hdmi?
> can the images be taken to the gui stage , ie xbmc or any other multimedia
> screen ?
>
> Sami
>
>
> On Fri, Nov 9, 2012 at 10:42 AM, Stefan Esser  wrote:
>
>> Am 09.11.2012 05:44, schrieb Tim Kientzle:
>> >> On Wed, Nov 7, 2012 at 6:01 PM, Tim Kientzle 
>> wrote:
>> >> WARNING:  This is still highly experimental and by no
>> >> means ready for "production use", ...
>> >>
>> >> To boot FreeBSD on your RaspberryPi, you'll need:
>> >>   1) A RaspberryPi.
>> >>   2) A serial cable similar to this one:
>> www.adafruit.com/products/954
>> >
>> >
>> > On Nov 8, 2012, at 9:13 AM, Sami Halabi wrote:
>> >>
>> >> why the console cable is needed ?
>> >>
>> > As far as I can tell, the code in FreeBSD-CURRENT
>> > does not yet support the video out.  So you need
>> > a serial console cable to interact with it.
>>
>> All it takes to get the framebuffer working is that the hash chars are
>> removed. I.e. the following works:
>>
>> device  sc
>> device  kbdmux
>> options SC_DFLT_FONT# compile font in
>> makeoptions SC_DFLT_FONT=cp437
>>
>> > You might be able to interact via SSH but
>> > that requires a little bit more setup (a root
>> > password needs to be set and you need to
>> > edit /etc/ssh/sshd_config to allow root logins).
>>
>> I used SSH, and the framebuffer helps to see how far the boot process
>> has come. It takes about 60 seconds to generate the SSH host keys, for
>> example.
>>
>> [The following points are not specific to the R-PI, and I'm sure you
>> know them, but I list them for others that may want to use their R-PI
>> without serial console.]
>>
>> In order to use SSH I modified sshd_config to accept direct root logins.
>> The root password must be set (best method: "vipw -d /mnt/etc", else
>> you must remember to invoke "pwd_mkdb -d /mnt/etc" when you are done).
>>
>> The host name and IP address should be set in rc.conf (or assigned via
>> DHCP).
>>
>> If you do not want to enable direct root login, then a non-privileged
>> account in group wheel is required to be able to "su" to root.
>>
>> That's all I remember ...
>>
>> I used the build script from "http://kernelnomicon.org/?p=164"; with
>> one slight modification (tar -x ... --no-same-owner ...).
>>
>> My R-PI kernel contains MSDOSFS and NFS client support to allow it
>> to mount its boot partition and NFS exported /usr/src, /usr/obj,
>> /usr/ports and /usr/work (where I build ports). Most of them are
>> R/O mounts. I have not tried to build world on the R-PI (cross
>> building is so much faster ...). But ports can be build, if a swap
>> partition is available (e.g. on SD card or via NFS - I did not try
>> to mount a USB stick, but that might be another option).
>>
>> Regards, STefan
>> ___
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org
>> "
>>
>
>
>
> --
> Sami Halabi
> Information Systems Engineer
> NMS Projects Expert
> FreeBSD SysAdmin Expert
>
>


-- 
Sami Halabi
Information Systems Engineer
NMS Projects Expert
FreeBSD SysAdmin Expert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: Memory reserves or lack thereof

Re: watchdogd, jemalloc, and mlockall

Re: FreeBSD on RaspberryPi

3 matches

Site Navigation

Mail list logo

Footer information