On 02/02/16 07:40, Li, Liang Z wrote:
> Hi David,
>
> We found dom0 will crash when booing on HSW-EX server, the dom0 kernel 
> version is v4.4. By debugging I found the your patch
> ' x86/xen: discard RAM regions above the maximum reservation' , which the 
> commit ID is : f5775e0b6116b7e2425ccf535243b21
> caused the regression. The debug message is listed below:
> ===============================================================
>  (XEN) mm.c:884:d0v14 pg_owner 0 l1e_owner 0, but real_pg_owner -1
>  (XEN) mm.c:955:d0v14 Error getting mfn 1080000 (pfn ffffffffffffffff) from 
> L1 
>  (XEN) mm.c:1269:d0v14 Failure in alloc_l1_table: entry 0
>  (XEN) mm.c:2175:d0v14 Error while validating mfn 188d903 (pfn 17a7cc) for 
> type 
>  (XEN) mm.c:3101:d0v14 Error -16 while pinning mfn 188d903
>  [   33.768792] ------------[ cut here ]------------
> WARNING: CPU: 14 PID: 1 at arch/x86/xen/multicalls.c:129 xen_mc_
>  [   33.783809] Modules linked in:
>  [   33.787304] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.4.0 #1
>  [   33.793991] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
>  [   33.805624]  0000000000000081 ffff88017d2537c8 ffffffff812ff954 
> 000000000000
>  [   33.813961]  0000000000000000 0000000000000081 0000000000000000 
> ffff88017d25
>  [   33.822300]  ffffffff810ca120 ffffffff81cb7f00 ffff8801879ca280 
> 000000000000
>  [   33.830639] Call Trace:
>  [   33.833457]  [<ffffffff812ff954>] dump_stack+0x48/0x64
>  [   33.839277]  [<ffffffff810ca120>] warn_slowpath_common+0x90/0xd0
>  [   33.846058]  [<ffffffff810ca175>] warn_slowpath_null+0x15/0x20
>  [   33.852659]  [<ffffffff81060133>] xen_mc_flush+0x1c3/0x1d0
>  [   33.858858]  [<ffffffff8106449f>] xen_alloc_pte+0x20f/0x300
>  [   33.865158]  [<ffffffff810beef5>] ? update_page_count+0x45/0x60
>  [   33.871855]  [<ffffffff817a1194>] ? phys_pte_init+0x170/0x183
>  [   33.878345]  [<ffffffff817a148d>] phys_pmd_init+0x2e6/0x389
>  [   33.884649]  [<ffffffff817a17dd>] phys_pud_init+0x2ad/0x3dc
>  [   33.890954]  [<ffffffff817a290d>] kernel_physical_mapping_init+0xec/0x211
>  [   33.898613]  [<ffffffff8179df8d>] init_memory_mapping+0x17d/0x2f0
>  [   33.905496]  [<ffffffff81104f11>] ? 
> __raw_callee_save___pv_queued_spin_unloc
>  [   33.914516]  [<ffffffff813643f7>] ? acpi_os_signal_semaphore+0x2e/0x32
>  [   33.921889]  [<ffffffff810ba7b8>] arch_add_memory+0x48/0xf0
>  [   33.928186]  [<ffffffff8179eb80>] add_memory_resource+0x80/0x110
>  [   33.934967]  [<ffffffff8179ec8d>] add_memory+0x7d/0xc0
>  [   33.940787]  [<ffffffff81399538>] acpi_memory_device_add+0x14f/0x237
>  [   33.947963]  [<ffffffff81369a6d>] acpi_bus_attach+0xcb/0x166
>  [   33.954359]  [<ffffffff81369acd>] acpi_bus_attach+0x12b/0x166
>  [   33.960854]  [<ffffffff81369acd>] acpi_bus_attach+0x12b/0x166
>  [   33.967350]  [<ffffffff81369acd>] acpi_bus_attach+0x12b/0x166
>  [   33.973848]  [<ffffffff8136aff1>] acpi_bus_scan+0x5b/0x66
>  [   33.979962]  [<ffffffff81d31e04>] ? acpi_early_init+0xeb/0xeb
>  [   33.986450]  [<ffffffff81d32187>] acpi_scan_init+0x7d/0x1c4
>  [   33.992755]  [<ffffffff81d31e04>] ? acpi_early_init+0xeb/0xeb
>  [   33.999248]  [<ffffffff81d31e04>] ? acpi_early_init+0xeb/0xeb
>  [   34.005747]  [<ffffffff81d3204a>] acpi_init+0x246/0x282
>  [   34.011659]  [<ffffffff81d31e04>] ? acpi_early_init+0xeb/0xeb
>  [   34.018156]  [<ffffffff810020b1>] do_one_initcall+0x81/0x1e0
>  [   34.024557]  [<ffffffff81cf5c06>] kernel_init_freeable+0x19d/0x238
>  [   34.031542]  [<ffffffff81cf5ca1>] ? kernel_init_freeable+0x238/0x238
>  [   34.038711]  [<ffffffff8179d490>] ? rest_init+0x80/0x80
>  [   34.044626]  [<ffffffff8179d499>] kernel_init+0x9/0xe0
>  [   34.050450]  [<ffffffff817aa3cf>] ret_from_fork+0x3f/0x70
>  [   34.056552]  [<ffffffff8179d490>] ? rest_init+0x80/0x80
>  [   34.062475] ---[ end trace 854dae1bef359299 ]---
> ============================================================================================
>
> You can get more information in 'error_log.txt'.
>
> Any idea? 
> I don't know your original intention of this patch, so just send a revert 
> patch to fix the issue is not a good choice, 
> May be you have better solution.
>
> Liang
>
>
> error_log.txt
>
>
> (XEN) Bad console= option '8n1'

8n1 should be part of com1= or com2=, rather than console=

>  Xen 4.7-unstable
> (XEN) Xen version 4.7-unstable (build@) (gcc (GCC) 4.4.7 20120313 (Red Hat 
> 4.4.7-16)) 
> debug=y 
> Thu Jan 
> 21 
> 23:21:32
>  EST 
> 2016
> (XEN) Latest ChangeSet: Tue Jan 19 17:47:19 2016 +0000 git:1949868-dirty
> (XEN) Console output is synchronous.
> (XEN) Bootloader: GNU GRUB 0.97
> (XEN) Command line: dom0_mem=4096M loglvl=all guest_loglvl=all 
> unrestricted_guest=1 
> msi=1 
> console=com1,115200,8n1
>  
> sync_console
>  
> hap_1gb=1
>  
> conring_size=128M
>  
> iommu=on,intpost
>  psr=cmt 
> psr=cat 
> psr=cdp

This is very hard to read with the VT escape characters still present. 
However, you probably meant dom0_mem=4096M:max=4096M, or dom0 gets all
the remaining RAM.

Having said that, giving dom0 all the RAM should work, and...

>  [   33.656695] ACPI: NR_CPUS/possible_cpus limit of 64 
> reached.  Processor 
> 99/0
> .[   33.665648] ACPI: Unable to map lapic to logical cpu number
>  (XEN) mm.c:884:d0v14 pg_owner 0 l1e_owner 0, but 
> real_pg_owner -1
>  (XEN) mm.c:955:d0v14 Error getting mfn 1080000 (pfn 
> ffffffffffffffff) from L1 
> e
> 0(XEN) mm.c:1269:d0v14 Failure in alloc_l1_table: entry 0
>  (XEN) mm.c:2175:d0v14 Error while validating mfn 188d903 (pfn 
> 17a7cc) for type 
> 
> 1(XEN) mm.c:3101:d0v14 Error -16 while pinning mfn 188d903

This is a -EBUSY.  Is there anything magic about mfn 188d903?  It just
looks like plain RAM in the E820 table.

Have you got dom0 configured to use linear p2m mode?  Without it, dom0
can only have a maximum of 512GB of RAM.

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to