On Thu, Dec 13, 2018 at 09:50:13PM -0500, Nick Holland wrote:

> On 12/11/18 08:09, Otto Moerbeek wrote:
> > On Mon, Dec 10, 2018 at 11:44:47AM +0100, Otto Moerbeek wrote:
> > 
> >> On Mon, Dec 10, 2018 at 08:30:10AM +0100, Otto Moerbeek wrote:
> >> 
> >> > Hi,
> >> > 
> >> > the bootloader uses a very simple allocator for dynamic memory. It
> >> > maintains a list of free allocations. If it needs a block, it searches
> >> > the freelist and returns the smallest allocation that fits.
> >> > 
> >> > Allocation patterns like this (starting with an empty freelist)
> >> > 
> >> > alloc(big)
> >> > free(big)
> >> > alloc(small)
> >> > 
> >> > will assigned a big block for the small allocation, wasting most
> >> > memory. The allocator does not split up this block. After this, a new
> >> > big allocation will grow the heap with the big amount. This diff
> >> > changes the strategy by not re-using a block from the free list if
> >> > half the space or more would be wasted. Instead, it grows the heap by
> >> > the requested amount.
> >> > 
> >> > This make it possible for me to boot using a root fs with a large
> >> > blocksize. There have been several reports of large roots not working
> >> > (the bootloader allocates memory based om the blocksize of the file
> >> > system, and by default larger filesystems use larger blocks).
> >> > 
> >> > How to test
> >> > ===========
> >> > 
> >> > Apply diff and do a full build including building release. After that,
> >> > either upgrade using your newly built cd64.iso, bsd.rd or other
> >> > mechanism or do a full install. Test that you can boot afterwards.
> >> > 
> >> > This needs to be tested on various platforms, both will small and big
> >> > (> 600G) root filesystems.  Yes, this is tedious, but we want large
> >> > coverage of different cases.
> >> > 
> >> >  -Otto
> >> 
> >> As it turns out by my own testing, on amd64 root filssytems using 32k
> >> blocks now work fine, but 64k fs blocks still hit a ceiling. This
> >> corresponds to > 512G disks if you use the defaults.
> >> 
> >>    -Otto
> >> 
> > 
> > New diff that also works on root filesystems > 500G. It avoid using a
> > large bouncebuffer by reding large buffers in a loop instead of one go.
> > 
> >     -Otto
> 
> You are my hero.
> Seems it is possible to hose a system by making a 32k block size on a
> system with a root file system of only 500MB.  I really don't know how I
> did this, much less why, but it's been causing me reboot problems for
> over a year now.

Default block size is 16k. Filesystems > 128G get a block size of 32k
and > 512G get 64k boock size. Fragments remain at 8 per block.

I don't know what you did with your 512M fs to make it get a 32k
blocksize. Did it have a large partition earlier and then you reduced
it's size? In that case the bs will remain the same, even if you
newfs, since newfs reads the bs from the label if not given on the
command line.

The i386 and amd64 bootloaders are very tight on mem, they live in a
640k world. Some space is used for the stack and the code itself also
needs to fit in. Only a few 100k of heap remain. In various places
blocks of file system block size are allocated so you'll hit the limit
pretty quickly. The cpu firmware microcode loading function having a
mem leak (fixed already by jsing) also didn't help. 

Anyway, thanks for confirming my tests, even if by accident ;-)

        -Otto

> 
> Upgraded to today's snap, problem solved.
> 
> Nick.
> 
> /home/nick $ dmesg|head  
> OpenBSD 6.4-current (GENERIC.MP) #510: Thu Dec 13 06:20:42 MST 2018
>     dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> 
> > p m
> OpenBSD area: 64-2000397735; size: 976756.7M; free: 859532.4M
> #                size           offset  fstype [fsize bsize   cpg]
>   a:           502.0M               64  4.2BSD   4096 32768     1 # / # wtf?
>   b:         20473.5M       1048578560    swap                    # 
>   c:        976762.3M                0  unused                    
>   d:         10244.6M       1090508288  4.2BSD   2048 16384     1 # /usr
>   e:          4094.7M       1111489152  4.2BSD   2048 16384     1 # /tmp
>   f:         10236.7M       1119875072  4.2BSD   2048 16384     1 # /var
>   g:         20473.5M       1161804704  4.2BSD   2048 16384     1 # /repo
>   h:         10236.7M       1140839904  4.2BSD   2048 16384     1 # /home
>   i:             7.8M       1203734368  4.2BSD   2048 16384     1 # 
>   j:         40954.8M       1203750432  4.2BSD   2048 16384     1 
> 
> 
> > Index: arch/amd64/stand/libsa/biosdev.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/stand/libsa/biosdev.c,v
> > retrieving revision 1.32
> > diff -u -p -r1.32 biosdev.c
> > --- arch/amd64/stand/libsa/biosdev.c        10 Aug 2018 16:41:35 -0000      
> > 1.32
> > +++ arch/amd64/stand/libsa/biosdev.c        11 Dec 2018 13:00:02 -0000
> > @@ -340,11 +340,26 @@ biosd_io(int rw, bios_diskinfo_t *bd, u_
> >     return error;
> >  }
> >  
> > +#define MAXSECTS 32
> > +
> >  int
> >  biosd_diskio(int rw, struct diskinfo *dip, u_int off, int nsect, void *buf)
> >  {
> > -   return biosd_io(rw, &dip->bios_info, off, nsect, buf);
> > +   int i, n, ret;
> > +
> > +   /*
> > +    * Avoid doing too large reads, the bounce buffer used by biosd_io()
> > +    * might run us out-of-mem.
> > +    */
> > +   for (i = 0, ret = 0; ret == 0 && nsect > 0;
> > +       i += MAXSECTS, nsect -= MAXSECTS) {
> > +           n = nsect >= MAXSECTS ? MAXSECTS : nsect;
> > +           ret = biosd_io(rw, &dip->bios_info, off + i, n,
> > +               buf + i * DEV_BSIZE);
> > +   }
> > +   return ret;
> >  }
> > +
> >  /*
> >   * Try to read the bsd label on the given BIOS device.
> >   */
> > @@ -715,7 +730,6 @@ biosstrategy(void *devdata, int rw, dadd
> >      size_t *rsize)
> >  {
> >     struct diskinfo *dip = (struct diskinfo *)devdata;
> > -   bios_diskinfo_t *bd = &dip->bios_info;
> >     u_int8_t error = 0;
> >     size_t nsect;
> >  
> > @@ -732,7 +746,7 @@ biosstrategy(void *devdata, int rw, dadd
> >     if (blk < 0)
> >             error = EINVAL;
> >     else
> > -           error = biosd_io(rw, bd, blk, nsect, buf);
> > +           error = biosd_diskio(rw, dip, blk, nsect, buf);
> >  
> >  #ifdef BIOS_DEBUG
> >     if (debug) {
> > Index: arch/i386/stand/libsa/biosdev.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/i386/stand/libsa/biosdev.c,v
> > retrieving revision 1.98
> > diff -u -p -r1.98 biosdev.c
> > --- arch/i386/stand/libsa/biosdev.c 6 Sep 2018 11:50:54 -0000       1.98
> > +++ arch/i386/stand/libsa/biosdev.c 11 Dec 2018 13:00:02 -0000
> > @@ -341,11 +341,26 @@ biosd_io(int rw, bios_diskinfo_t *bd, u_
> >     return error;
> >  }
> >  
> > +#define MAXSECTS 32
> > +
> >  int
> >  biosd_diskio(int rw, struct diskinfo *dip, u_int off, int nsect, void *buf)
> >  {
> > -   return biosd_io(rw, &dip->bios_info, off, nsect, buf);
> > +   int i, n, ret;
> > +
> > +   /*
> > +    * Avoid doing too large reads, the bounce buffer used by biosd_io()
> > +    * might run us out-of-mem.
> > +    */
> > +   for (i = 0, ret = 0; ret == 0 && nsect > 0;
> > +       i += MAXSECTS, nsect -= MAXSECTS) {
> > +           n = nsect >= MAXSECTS ? MAXSECTS : nsect;
> > +           ret = biosd_io(rw, &dip->bios_info, off + i, n,
> > +               buf + i * DEV_BSIZE);
> > +   }
> > +   return ret;
> >  }
> > +
> >  /*
> >   * Try to read the bsd label on the given BIOS device.
> >   */
> > @@ -716,7 +731,6 @@ biosstrategy(void *devdata, int rw, dadd
> >      size_t *rsize)
> >  {
> >     struct diskinfo *dip = (struct diskinfo *)devdata;
> > -   bios_diskinfo_t *bd = &dip->bios_info;
> >     u_int8_t error = 0;
> >     size_t nsect;
> >  
> > @@ -733,7 +747,7 @@ biosstrategy(void *devdata, int rw, dadd
> >     if (blk < 0)
> >             error = EINVAL;
> >     else
> > -           error = biosd_io(rw, bd, blk, nsect, buf);
> > +           error = biosd_diskio(rw, dip, blk, nsect, buf);
> >  
> >  #ifdef BIOS_DEBUG
> >     if (debug) {
> > Index: lib/libsa/alloc.c
> > ===================================================================
> > RCS file: /cvs/src/sys/lib/libsa/alloc.c,v
> > retrieving revision 1.12
> > diff -u -p -r1.12 alloc.c
> > --- lib/libsa/alloc.c       14 Mar 2016 23:08:06 -0000      1.12
> > +++ lib/libsa/alloc.c       11 Dec 2018 13:00:03 -0000
> > @@ -169,7 +169,7 @@ alloc(unsigned int size)
> >     }
> >  
> >     /* no match in freelist if bestsize unchanged */
> > -   failed = (bestsize == 0xffffffff);
> > +   failed = (bestsize == 0xffffffff || bestsize >= size * 2);
> >  #endif
> >  
> >     if (failed) { /* nothing found */
> > 
> 

Reply via email to