Core

Doug Ledford Fri, 31 Jul 2015 17:31:20 -0700

On 07/31/2015 04:18 PM, Jason Gunthorpe wrote:
> On Fri, Jul 31, 2015 at 01:41:39PM -0400, Doug Ledford wrote:
>> Please be more specific here.  What are you objecting to?  Are you
>> objecting to a flush_workqueue from a release() context?  Because I
>> don't see anything in the kref documentation or the kref
>> implementation that prevents or prohibits this.  Or are you
>> referring to the fact that they aren't unregistering their event
>> handler until well after the parent device is unregistered?
> 
> I'm talking about the basic invarient we have for our removal process:
> ib_unregister_device must fence all calls into the driver, so the
> driver can proceed with the low level remove on the asumption that
> nothing will call into it after unregister returns.


Agreed.

> Clients create this fence by calling things like flush_workqueue and
> ib_unregister_event_handler.

Agreed.

> If the fence is moved out of ib_unregister_device and into release
> then the whole model breaks.

Agreed.

> I've explained this all before on the v1 of these series..
> 
>> That's obviously wrong, but it's also easy to fix (the obvious fix
>> is that they should be calling ib_cache_cleanup_one from the top of
>> ib_unregister_device versus waiting for a kref put).
> 
> Well, no, that is not the obvious fix, that puts things back the way
> they were in v6. As I explained the kfrees need to be in the release
> function,

Disagree.

> the async fencing needs to remain in the remove_one or
> equivalent.

Agree.

So, my point of disagreement above.  Except in circumstances with a damn
good reason, I expect code to be symmetrical.  If you allocate some
caches in ib_cache_init_one(), I expect you to kfree those caches in
ib_cache_cleanup_one().  Further, if you call ib_cache_init_one() as one
of the last items before returning from ib_register_device(), then I
expect you to call ib_cache_cleanup_one() as one of the first items in
ib_unregister_device().

However, the fencing doesn't belong to ib_cache_cleanup_one(), it
belongs at the head of ib_unregister_device() and ib_cache_cleanup_one()
should only be called after the fence is complete.

And in this respect, removing the device from the various clients is
actually part of the fencing.  The order of ops in ib_register_device is:

ib_device_register_sysfs
ib_cache_init_one
for client_list {
    add_client_context
    client->add
}

Consequently, the order of ops in ib_unregister_device should be:

for client_list
    mark coming down
for client_list
    remove(client)
for client_list
    kfree(client)
ib_cache_cleanup_one <- missing
ib_device_unregister_sysfs

In many respects, I expect the ib_unregister_device() call to mirror the
error unwind found in the register call with the modifications for
dealing with a device that was actually live.

It's certainly possible in certain circumstances to veer from this
paradigm, but I expect a good reason for doing so.  A person who isn't
immediately RDMA aware can come in and fix a locking bug when the code
is like it is above.  If you start playing subtle tricks with the
locking and ordering of removal, you start to trip up anyone that
doesn't have first hand knowledge of those tricks and why they were
implemented.  That negatively impacts the long term support of the code,
so the reason should be sufficient to warrant the drawback and the
subtle trick should be documented in the code so someone coming around
later knows what's going on.

To a certain extent, your comments in the v6 thread about the cache
being expected to work beyond the life of the device and therefore part
of the core stack are somewhat reasonable, but the life of the cache on
a per device basis is no different than the life of the client
registration.  The above flow would remove the cache from the client
list, but will still in effect treat it just like a client.  The only
difference (and one that I think is correct), is that by taking it out
of the client_list and those loops, you are able to make sure it's the
first client that comes up and the last one that goes down and that it
continues to work for as long as it needs to (after all, once all of the
clients are unregistered and removed for that device, and we are in the
ib_unregister_device routine, the device is going away and no more calls
into the cache will arrive for it).

>> Regardless though, the reason I'm taking this (and a number of other
>> things) into my tree is that waiting for things to be absolutely perfect
>> on a submission is an interminable waiting game.
> 
> So, I'm confused by this..
> 
> I haven't been waiting, I've looked at every release of every one of
> these three big core change patch set. And found actual problems.

Some of lesser and greater importance.  Some worth resubmitting the
entire patchset, and some that I would be perfectly OK with an
incremental patch later on.  A problem does not always mean resubmit, it
can mean fix before merge window with another patch (or during merge
window depending on the situation).

> What I don't understand is your timing on these.. No 'hey, could we
> finish these reviews' or anything, just out of the blue, v7 merged! It
> was just posted yesterday!

That's because v6 was mostly done and the v6 -> v7 changes were minor.
I would have taken v6 and then accepted incremental patches had it still
applied.

These branches are from my internal tree and really are WIP branches.
I've pushed those WIP branches to github.  As I stated when I set my two
different repos up, my github repo is my WIP repo.  It's for people to
be able to see where we are, pull it if they wish, and preferably to
test against ahead of time.  I *want* people to be able to see where I'm
going.  But that doesn't mean it is fixed in stone.  For things like the
fix to the v7 version of this patch, I will accept an incremental patch,
but it need not have a complete changelog because if it just fixes one
of the patches in this submission, I'm OK with squashing it down into
the original patch in order to keep the patch count down and
bisectability up.  But that can only be done prior to the merge window.
 After the merge window opens and the code heads to Linus, fixes must be
discreet patches.

So you can actually break the merge/testing window down into two
distinct phases: pre-pull (where we can fix broken patches by squashing
a fix into and thereby preserve bisections) and post-pull (which is what
we are all used to).

This merge window is riskier than most because of the patches queued up
to go in.  By pulling all of this together and publishing my git repo
early, I'm inviting other interested parties to join me in the early
testing process where we can fix problems and also preserve bisection.

So that's why I'm pulling things together and posting my git repo ahead
of the merge window.  To speak more directly to the timing issue though,
there is only about 3 weeks left before the merge window opens, and all
of next week I'm going to be working on other things.  So I wanted to
get something out that people can look at and test with before I have to
redirect for a week, and that will still leave me two weeks to integrate
fixes prior to the merge window opening.

>> Not to mention that some of these fixes are quick and easy
>> to do and I'd rather quit waiting 24+ hours for a respin turnaround when
>> I could just hop in and make the fix myself in 15 minutes and test it
>> immediately.
> 
> It would have been a lot less work for me to just fix the various
> issues than to explain them to the authors. I'm hoping that investment
> means the next series around will be good at V1...
> 
>>> Looking at your for-rebase branch.. please make sure you get all the
>>> attributions this cycle :|
>>
>> Yet *another* reason why these v6, v7, v8 patchsets are a huge time
>> drain :-/.
> 
> Sagi's recent patch set is missing several attributions too..
> 
> Jason
> 


-- 
Doug Ledford <dledf...@redhat.com>
              GPG KeyID: 0E572FDD

signature.asc
Description: OpenPGP digital signature

Re: [PATCH for-next V7 00/10] Move RoCE GID management to IB/Core

Reply via email to