Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeul...@suse.com>
> Sent: 2022年4月26日 22:42
> To: Wei Chen <wei.c...@arm.com>
> Cc: nd <n...@arm.com>; Andrew Cooper <andrew.coop...@citrix.com>; Roger Pau
> Monné <roger....@citrix.com>; Wei Liu <w...@xen.org>; George Dunlap
> <george.dun...@citrix.com>; Julien Grall <jul...@xen.org>; Stefano
> Stabellini <sstabell...@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v2 05/10] xen/x86: Use ASSERT instead of
> VIRTUAL_BUG_ON for phys_to_nid
> 
> On 26.04.2022 12:59, Wei Chen wrote:
> > On 2022/4/26 17:02, Jan Beulich wrote:
> >> On 18.04.2022 11:07, Wei Chen wrote:
> >>> VIRTUAL_BUG_ON is an empty macro used in phys_to_nid. This
> >>> results in two lines of error-checking code in phys_to_nid
> >>> that is not actually working and causing two compilation
> >>> errors:
> >>> 1. error: "MAX_NUMNODES" undeclared (first use in this function).
> >>>     This is because in the common header file, "MAX_NUMNODES" is
> >>>     defined after the common header file includes the ARCH header
> >>>     file, where phys_to_nid has attempted to use "MAX_NUMNODES".
> >>>     This error was resolved when we moved the definition of
> >>>     "MAX_NUMNODES" to x86 ARCH header file. And we reserve the
> >>>     "MAX_NUMNODES" definition in common header file through a
> >>>     conditional compilation for some architectures that don't
> >>>     need to define "MAX_NUMNODES" in their ARCH header files.
> >>
> >> No, that's setting up a trap for someone else to fall into, especially
> >> with the #ifdef around the original definition. Afaict all you need to
> >> do is to move that #define ahead of the #include in xen/numa.h. Unlike
> >> functions, #define-s can reference not-yet-defined identifiers.
> >>
> >
> > I had tried it before. MAX_NUMNODES depends on NODE_SHIFT. But
> > NODE_SHIFT depends on the definition status in asm/numa.h. If I move
> > MAX_NUMNODES to before asm/numa.h, then I have to move NODES_SHIFT as
> > well. But this will break the original design. NODES_SHIFT in xen/numa.h
> > will always be defined before asm/numa.h. This will be a duplicated
> > definition error.
> 
> I'm afraid I don't follow. MAX_NUMNODES depends on NODES_SHIFT only as
> soon as some code actually uses MAX_NUMNODES. It does not require
> NODES_SHIFT to be defined up front. Of course with the current layout
> (phys_to_nid() living in an inline function in asm/numa.h) things won't
> build. But wasn't the plan to move phys_to_nid() to xen/numa.h as well?
>

Yes, I will drop this patch from part#1, and move it to part#2. This
patch will follow when we move phys_to_nid() to xen/numa.h.

Thanks,
Wei Chen

> Otherwise I'd recommend to introduce a new header, say numa-defs.h,
> holding (for now) just NODES_SHIFT. Then you'd include asm/numa-defs.h
> first and asm/numa.h only after having defined MAX_NUMNODES. But
> splitting the header should only be a last resort if things can't be
> made work another way.
> 
> > How about I move MAX_NUMNODES to arm and x86 asm/numa.h in this patch
> > at the same time? Because in one of following patches, MAX_NUMNODES and
> > phys_to_nid will be moved to xen/numa.h at the same time?
> >
> >>> 2. error: wrong type argument to unary exclamation mark.
> >>>     This is because, the error-checking code contains !node_data[nid].
> >>>     But node_data is a data structure variable, it's not a pointer.
> >>>
> >>> So, in this patch, we use ASSERT instead of VIRTUAL_BUG_ON to
> >>> enable the two lines of error-checking code. And fix the left
> >>> compilation errors by replacing !node_data[nid] to
> >>> !node_data[nid].node_spanned_pages.
> >>>
> >>> Because when node_spanned_pages is 0, this node has no memory,
> >>> numa_scan_node will print warning message for such kind of nodes:
> >>> "Firmware Bug or mis-configured hardware?".
> >>
> >> This warning is bogus - nodes can have only processors. Therefore I'd
> >> like to ask that you don't use it for justification. And indeed you
> >
> > Yes, you're right, node can only has CPUs! I will remove it.
> >
> >> don't need to: phys_to_nid() is about translating an address. The
> >> input address can't be valid if it maps to a node with no memory.
> >>
> >
> > Can I understand your comment:
> > Any input address is invalid, when node_spanned_pages is zero, because
> > this node has no memory?
> 
> It's getting close, but it's not exactly equivalent I think. A node
> with 0 bytes of memory might (at least in theory) have an entry in
> memnodemap[]. But finding a node ID for that address would still

I have done a quick check in populate_memnodemap:
74          spdx = paddr_to_pdx(nodes[i].start);
75          epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
76          if ( spdx >= epdx )
77              continue;

It seems that if node has no memory, start == end, then this function
will not populate memnodemap entry for this node.

> not mean that at least one byte of memory at that address is present
> on the given node, because the node covers 0 bytes.
> 

And back to this patch, can I just drop the unnecessary justification
from the commit message?

And for the bogus warning message, can I update it to an INFO level
message in part#2 series, and just keep:
printk(KERN_INFO "SRAT: Node %u has no memory!\n", i);
but remove "BIOS Bug or mis-configured hardware?\n", i); ?

Thanks,
Wei Chen

> Jan

Reply via email to