Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeul...@suse.com>
> Sent: 2022年9月29日 20:14
> To: Wei Chen <wei.c...@arm.com>
> Cc: nd <n...@arm.com>; Andrew Cooper <andrew.coop...@citrix.com>; Roger Pau
> Monné <roger....@citrix.com>; Wei Liu <w...@xen.org>; George Dunlap
> <george.dun...@citrix.com>; Julien Grall <jul...@xen.org>; Stefano
> Stabellini <sstabell...@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v5 2/6] xen/x86: move generically usable NUMA code
> from x86 to common
> 
> On 29.09.2022 09:43, Wei Chen wrote:
> > On 2022/9/27 16:19, Jan Beulich wrote:
> >> On 20.09.2022 11:12, Wei Chen wrote:
> >>> +        nodes_used++;
> >>> +        if ( epdx > memtop )
> >>> +            memtop = epdx;
> >>> +    }
> >>> +
> >>> +    if ( nodes_used <= 1 )
> >>> +        i = BITS_PER_LONG - 1;
> >>
> >> Is this actually going to be correct for all architectures? Aiui
> >> Arm64 has only up to 48 physical address bits, but what about an
> >> architecture allowing the use of all 64 bits? I think at the very
> >> least we want BUILD_BUG_ON(PADDR_BITS >= BITS_PER_LONG) here.
> >>
> >
> > Ok I will add above BUILD_BUG_ON. And I also have question why can't
> > we use PADDR_BITS here directly?
> 
> Well, if you used PADDR_BITS, then you would use it without subtracting
> 1, and you'd be in trouble again when PADDR_BITS == BITS_PER_LONG. What
> may be possible to do instead of BUILD_BUG_ON() is
> 
>     if ( nodes_used <= 1 )
>         i = min(PADDR_BITS, BITS_PER_LONG - 1);
> 

This one seems better, I will follow it.

> >>> +    else
> >>> +        i = find_first_bit(&bitfield, sizeof(unsigned long) * 8);
> >>> +
> >>> +    memnodemapsize = (memtop >> i) + 1;
> >>
> >> Again perhaps the subject of a separate patch: Isn't there an off-by-1
> >> mistake here? memtop is the maximum of all epdx-es, which are
> >> calculated to be the first PDX following the region. Hence I'd expect
> >>
> >>      memnodemapsize = ((memtop - 1) >> i) + 1;
> >>
> >> here. I guess I'll make patches for both issues, which you may then
> >> need to re-base over.
> >>
> >
> > Thanks, I will wait your patches.
> 
> Already sent out yesterday.
> 

Ok.

> >>> +static void cf_check dump_numa(unsigned char key)
> >>> +{
> >>> +    s_time_t now = NOW();
> >>> +    unsigned int i, j, n;
> >>> +    struct domain *d;
> >>> +    const struct page_info *page;
> >>> +    unsigned int page_num_node[MAX_NUMNODES];
> >>> +    const struct vnuma_info *vnuma;
> >>> +
> >>> +    printk("'%c' pressed -> dumping numa info (now = %"PRI_stime")\n",
> key,
> >>> +           now);
> >>> +
> >>> +    for_each_online_node ( i )
> >>> +    {
> >>> +        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
> >>> +
> >>> +        printk("NODE%u start->%lu size->%lu free->%lu\n",
> >>> +               i, node_start_pfn(i), node_spanned_pages(i),
> >>> +               avail_node_heap_pages(i));
> >>> +        /* Sanity check phys_to_nid() */
> >>> +        if ( phys_to_nid(pa) != i )
> >>> +            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
> >>> +                   pa, phys_to_nid(pa), i);
> >>> +    }
> >>> +
> >>> +    j = cpumask_first(&cpu_online_map);
> >>> +    n = 0;
> >>> +    for_each_online_cpu ( i )
> >>> +    {
> >>> +        if ( i != j + n || cpu_to_node[j] != cpu_to_node[i] )
> >>> +        {
> >>> +            if ( n > 1 )
> >>> +                printk("CPU%u...%u -> NODE%d\n", j, j + n - 1,
> cpu_to_node[j]);
> >>> +            else
> >>> +                printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
> >>> +            j = i;
> >>> +            n = 1;
> >>> +        }
> >>> +        else
> >>> +            ++n;
> >>> +    }
> >>> +    if ( n > 1 )
> >>> +        printk("CPU%u...%u -> NODE%d\n", j, j + n - 1,
> cpu_to_node[j]);
> >>> +    else
> >>> +        printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
> >>> +
> >>> +    rcu_read_lock(&domlist_read_lock);
> >>> +
> >>> +    printk("Memory location of each domain:\n");
> >>> +    for_each_domain ( d )
> >>> +    {
> >>> +        process_pending_softirqs();
> >>> +
> >>> +        printk("Domain %u (total: %u):\n", d->domain_id,
> domain_tot_pages(d));
> >>> +
> >>> +        for_each_online_node ( i )
> >>> +            page_num_node[i] = 0;
> >>
> >> I'd be inclined to suggest to use memset() here, but I won't insist
> >> on you doing this "on the fly". Along with this would likely go the
> >> request to limit the scope of page_num_node[] (and then perhaps also
> >> vnuma and page).
> >>
> >
> > memset for page_num_node makes sense, I will do it before
> > for_each_domain ( d ).
> 
> That won't be right - array elements need clearing on every iteration.
> Plus ...
> 

Oh, Yes.

> > About limit the scope, did you mean, we should move:
> >
> > "const struct page_info *page;
> > unsigned int page_num_node[MAX_NUMNODES];
> > const struct vnuma_info *vnuma;"
> >
> > to the block of for_each_domain ( d )?
> 
> ... this limiting of scope (yes to your question) would also conflict
> with the movement you suggest. It is actually (among other things)
> such a mistaken movement which the more narrow scope is intended to
> prevent.
> 

Thanks,
Wei Chen

> Jan

Reply via email to