On Thu, Dec 15, 2016 at 03:56:56PM +0200, Konstantin Belousov wrote:

> > > Possibly, the dmesg of the boot (with late_console=0) with this and only
> > > this patch applied against stock HEAD.  This might be long.
> > 
> > Do you need all (262144?) lines?
> > 
> > Testing system
> > memory........................................................................................................................pb
> >  0x2040000000
> > pb 0x2040001000
> > pb 0x2040002000
> > pb 0x2040003000
> > pb 0x2040004000
> > pb 0x2040005000
> > pb 0x2040006000
> > [...]
> > pb 0x207ffff000
> > 
> > > diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
> > > index 682307f5fe4..072c8d76acf 100644
> > > --- a/sys/amd64/amd64/machdep.c
> > > +++ b/sys/amd64/amd64/machdep.c
> > > @@ -1400,6 +1400,7 @@ getmemsize(caddr_t kmdp, u_int64_t first)
> > >                    */
> > >                   *(int *)ptr = tmp;
> > >  
> > > +if (page_bad) printf("pb 0x%lx\n", pa);
> > >  skip_memtest:
> > >                   /*
> > >                    * Adjust array of valid/good pages.
> > 
> > PS: memtest86 hung at test 128-130G (server have 128G installed).
> Well, the physical memory is 128G, but it is not mapped contiguously into
> the address space accessible to the processors.  E.g. in the SMAPs you
> posted above, there are several holes (type 2) used for PCIe config
> window, PCI BARs, APICs, and other i/o register pages.  Intel chipsets
> allow to remap the RAM hidden by the io pages, which is probably not
> done correctly by BIOS.
> 
> The SMAP clearly reports segment 0x100000000-0x2080000000 as populated
> by RAM, this is 4G-130G.  Very primitive memory test in kernel does
> not like all pages starting at 129G.  Possibly important detail is that
> kernel memory test only touches first 4 bytes on each page.  So if BIOS
> erronously mapped any io registers into that range, memory test might
> luckily avoid touching anything critical, but still noting that the
> page does not behave as RAM.
> 
> Update BIOS, and if the issue persists, contact supermicro. This
> interesting detail adds even more evidence that BIOS is problematic.

Updated BIOS don't solve this.
_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to