On Tue, Jan 18, 2005 at 12:30:32PM +1100, Rusty Russell wrote:
> On Tue, 2005-01-18 at 00:06 +0530, Ravikiran G Thirumalai wrote:
> > ...
> > The allocator can be easily modified to use __per_cpu_offset[] table at a 
> > later
> > stage by: 
> > 1. Allocating ALIGN(__per_cpu_end - __per_cpu_start, PAGE_SIZE) for the
> >    static percpu areas and populating __per_cpu_offset[] offset table
> > 2. Making PCPU_BLKSIZE same as the static per cpu area size above
> > 3. Serving dynamic percpu requests from modules etc from blocks by
> >    returning ret -= __per_cpu_offset[0] from a percpu block.  This way
> >    modules need not have a limit on static percpu areas.
> 
> Unfortunately ia64 breaks (3).  They have pinned TLB entries covering
> 64k, which they put the static per-cpu data into.  This is used for
> local_inc, etc, and David Mosberger loved that trick (this is why my
> version allocated from that first reserved block for modules' static
> per-cpu vars).

Hmmm... then if we change (1) to allocate PERCPU_ENOUGH_ROOM, then the math
will work out?  We will still have a limit on static per-cpu areas in
modules, but alloc_percpu can use the same __per_cpu_offset table[].
Will this work?

But, what I am concerned is about arches like x86_64 which currently
do not maintain the relation:
__per_cpu_offset[n] = __per_cpu_offset[0] + static_percpu_size * n  ---> (A)
correct me if I am wrong, but both our methods for alloc_percpu to use
per_cpu_offset depend on the static per-cpu areas being virtually
contiguous (with relation (A) above being maintained).
If arches cannot sew up node local pages to form a virtually contiguous
block, maybe because setup_per_cpu_areas happens early during boot, 
then we will have a problem.

So a common solution could be:
Declare a dynamic percpu offset table 'alloc_percpu_offset' or
something like that, make it a static per-cpu variable.  Then, the 
blocks can be of any uniform size, we just have to fill
alloc_percpu_offset and use
        (RELOC_HIDE(ptr, per_cpu(alloc_percpu_offset, cpu))))
to get to to the cpu local versions.  I think dereference speeds can be 
fast with this method too since we use __per_cpu_offset[] indirectly.
Of course this is not needed if all arches can do node local allocation
and maintain relation (A) for static per-cpu areas.

Thanks,
Kiran
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to