Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held

Vladimir Sementsov-Ogievskiy Thu, 23 Jun 2022 04:12:20 -0700

On 6/23/22 12:08, Emanuele Giuseppe Esposito wrote:



Am 22/06/2022 um 20:38 schrieb Vladimir Sementsov-Ogievskiy:

On 6/22/22 17:26, Emanuele Giuseppe Esposito wrote:



Am 21/06/2022 um 19:26 schrieb Vladimir Sementsov-Ogievskiy:

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:

With the*nop*  job_lock/unlock placed, rename the static
functions that are always under job_mutex, adding "_locked" suffix.

List of functions that get this suffix:
job_txn_ref           job_txn_del_job
job_txn_apply           job_state_transition
job_should_pause       job_event_cancelled
job_event_completed       job_event_pending
job_event_ready           job_event_idle
job_do_yield           job_timer_not_pending
job_do_dismiss           job_conclude
job_update_rc           job_commit
job_abort           job_clean
job_finalize_single       job_cancel_async
job_completed_txn_abort       job_prepare
job_needs_finalize       job_do_finalize
job_transition_to_pending  job_completed_txn_success
job_completed           job_cancel_err
job_force_cancel_err

Note that "locked" refers to the*nop*  job_lock/unlock, and not
real_job_lock/unlock.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito<eespo...@redhat.com>



Hmm. Maybe it was already discussed.. But for me it seems, that it would
be simpler to review previous patches, that fix job_ API users to use
locking properly, if this renaming go earlier.

Anyway, in this series, we can't update everything at once. So patch to
patch, we make the code more and more correct. (yes I remember that
lock() is a noop, but I should review thinking that it real, otherwise,
how to review?)

So, I'm saying about formal correctness of using lock() unlock()
function in connection with introduced _locked prifixes and in
connection with how it should finally work.

You do:

05. introduce some _locked functions, that just duplicates, and
job_pause_point_locked() is formally inconsistent, as I said.

06. Update a lot of places, to give them their final form (but not
final, as some functions will be renamed to _locked, some not, hard to
imagine)

07,08,09. Update some more, and even more places. very hard to track
formal correctness of using locks

10-...: rename APIs.


What do you think about the following:

1. Introduce noop lock, and some internal _locked() versions, and keep
formal consistency inside job.c, considering all public interfaces as
unlocked:

   at this point:
    - everything correct inside job.c
    - no public interfaces with _locked prefix
    - all public interfaces take mutex internally
    - no external user take mutex by hand

We can rename all internal static functions at this step too.

2. Introduce some public _locked APIs, that we'll use in next patches

3. Now start fixing external users in several patches:
      - protect by mutex direct use of job fields
    - make wider locks and move to _locked APIs inside them where needed


In this scenario, every updated unit becomes formally correct after
update, and after all steps everything is formally correct, and we can
move to turning-on the mutex.


I don't understand your logic also here, sorry :(

I assume you want to keep patch 1-4, then the problem is assing job_lock
and renaming functions in _locked.
So I would say the problem is in patch 5-6-10-11-12-13. All the others
should be self contained.

I understand patch 5 is a little hard to follow.

Now, I am not sure what you propose here but it seems that the end goal
is to just have the same result, but with additional intermediate steps
that are just "do this just because in the next patch will be useful".
I think the problem is that we are going to miss the "why we need the
lock" logic in the patches if we do so.

The logic I tried to convey in this order is the following:
- job.h: add _locked duplicates for job API functions called with and
without job_mutex
     Just create duplicates of functions

- jobs: protect jobs with job_lock/unlock
     QMP and monitor functions call APIs that assume lock is taken,
     drivers must take explicitly the lock

- jobs: rename static functions called with job_mutex held
- job.h: rename job API functions called with job_mutex held
- block_job: rename block_job functions called with job_mutex held
     *given* that some functions are always under lock, transform
     them in _locked. Requires the job_lock/unlock patch

- job.h: define unlocked functions
     Comments on the public functions that are not _locked


@Kevin, since you also had some feedbacks on the patch ordering, do you
agree with this ordering or you have some other ideas?

Following your suggestion, we could move patches 10-11-12-13 before
patch 6 "jobs: protect jobs with job_lock/unlock".

(Apologies for changing my mind, but being the second complain I am
starting to reconsider reordering the patches).


In two words, what I mean: let's keep the following invariant from patch
to patch:

1. Function that has _locked() prefix is always called with lock held
2. Function that has _locked() prefix never calls functions that take
lock by themselves so that would dead-lock
3. Function that is documented as "called with lock not held" is never
called with lock held

That what I mean by "formal correctness": yes, we know that lock is
noop, but still let's keep code logic to correspond function naming and
comments that we add.


Ok I get what you mean, but then we have useless changes for public
functions that eventually will only be _locked() like job_next_locked:

The function is always called in a loop, so it is pointless to take the
lock inside. Therefore the patch would be "incorrect" on its own anyways.

Then, we would have a patch where we add the lock guard inside, and
another one where we remove it and rename to _locked and take the lock
outside. Seems unnecessary to me.


For me it looks a bit simpler than you describe. And anyway keeping the 
correctness from patch to patch worth the complexity. I'll give an argument.

First what is the best practices? Best practices is when every patch is good 
and absolutely correct. So that you can apply any number of patches from the 
beginning of the series (01-NN), commit them to master and this will break 
neither compilation, nor tests, nor readability, nothing. This makes the review 
process iterable: if I'm OK with patches 01-03, I give them r-b and don't think 
about them. I don't have to keep in mind any tricky things. And I can review 04 
several days later not rereading 01-03 (or at least I can consider applied 
01-03 as a good correct base state). This way I'm sure, that if I reviewed all 
patches one-by-one, each one is correct, then the whole thing is correct.

A lot harder to review when we have only collective correctness: the whole 
series being applied make a correct thing, but we can't say it about 
intermediate states. In your series we can't be absolutely correct with each 
patch, as we have to switch from aio-context lock to mutex in one patch, that's 
why mutex is added as noop. That's a reasonable and (seems) unsolvable 
drawback. That's a thing I have to keep in mind during the whole review. But 
I'd prefer not add more such things, like comments and _locked suffixes that 
don't correspond to the code.

With the invariant that I propose, the following logic works:

If
   1. we keep the invariant from patch to patch
   AND
   2. at the end we have updated all users of the internal and external APIs, 
not missed some file or function
Then everything is correct at the end.

Without the invariant I can't prove that everything is correct at the end, as 
it is hard to follow the degree of correctness from patch to patch. In your way 
the only invariant that we have from patch to patch, is that mutex is noop, so 
all changes do nothing, and therefore they are correct. This way I can give an 
r-b to all such patches not thinking about details, they are noop. But when I 
finally have to review the patch that turns on the mutex, I'll have to recheck 
all internal and external API users, which is equivalent to review all the 
changes merged into one patch.



Consider the case with job_next. The most correct way to update it IMHO:

1. Add lock inside job_next() and add job_next_locked() - in one patch with 
other similar changes of job.c and job.h.

At this moment we have job_next() calls in a loop, which is not good (we want 
larger critical section), but that doesn't break the invariant I proposed above.

2. Update the loop: add larger critical section and switch to job_next_locked().

What is good here: we don't  need to unite updates of external API users into 
one patch, we can update file-by-file or subsystem-by-sybsystem.

3. Delete unused job_next() API


--
Best regards,
Vladimir

Re: [PATCH v7 10/18] jobs: rename static functions called with job_mutex held

Reply via email to