On 6/22/20 11:02 PM, Greg Kroah-Hartman wrote:

First off, this is not my platform, and not my problem, so it's funny
you ask me :)

Weeeelll, not your platform perhaps but MAINTAINERS does list you first and 
Tejun second as maintainers for kernfs.  So in that sense, any patches would 
need to go thru you.  So, your opinions do matter.

Anyway, as I have said before, my first guesses would be:
        - increase the granularity size of the "memory chunks", reducing
          the number of devices you create.

This would mean finding every utility that relies on this behavior.  That may 
be possible, although not easy, for distro or platform software, but it's hard 
to guess what user-related utilities may have been created by other consumers 
of those distros or that platform.  In any case, removing an interface without 
warning is a hanging offense in many Linux circles.

        - delay creating the devices until way after booting, or do it
          on a totally different path/thread/workqueue/whatever to
          prevent delay at booting

This has been considered, but it again requires a full list of utilities relying on this 
interface and determining which of them may want to run before the devices are 
"loaded" at boot time.  It may be few, or even zero, but it would be a much 
more disruptive change in the boot process than what we are suggesting.

And then there's always:
        - don't create them at all, only only do so if userspace asks
          you to.

If they are done in parallel on demand, you'll see the same problem (load 
average of 1000+, contention in the same spot.)  You obviously won't hold up 
the boot, of course, but your utility and anything else running on the machine 
will take an unexpected pause ... for somewhere between 30 and 90 minutes.  
Seems equally unfriendly.

A variant of this, which does have a positive effect, is to observe that coldplug during 
initramfs does seem to load up the memory device tree without incident.  We do a second 
coldplug after we switch roots and this is the one that runs into timer issues.  I have 
asked "those that should know" why there is a second coldplug.  I can guess but 
would prefer to know to avoid that screaming option.  If that second coldplug is 
unnecessary for the kernfs memory interfaces to work correctly, then that is an 
alternate, and perhaps even better solution.  (It wouldn't change the fact that kernfs 
was not built for speed and this problem remains below the surface to trip up another.)

However, nobody I've found can say that is safe, and I'm not fond of the 'see 
who screams' test solution.

You all have the userspace tools/users for this interface and know it
best to know what will work for them.  If you don't, then hey, let's
just delete the whole thing and see who screams :)

I guess I'm puzzled by why everyone seems offended by suggesting we change a 
mutex to a rw semaphore.  In a vacuum, sure, but we have before and after 
numbers.  Wouldn't the same cavalier logic apply?  Why not change it and see 
who screams?

I haven't heard any criticism of the patch itself - I'm hearing criticism of 
the problem.  This problem is not specific to memory devices.  As we get larger 
systems,  we'll see it elsewhere. We do already see a mild form of this when 
fibre finds 1000-2000 fibre disks and goes to add them in parallel.  Small 
memory chunks introduces the problem at a level two orders of magnitude bigger, 
but eventually other devices will be subject to it too.  Why not address this 
now?

'Doctor, it hurts when I do this'
'Then don't do that'

Funny as a joke.  Less funny as a review comment.

Rick

Reply via email to