On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:

[...]

> In the hybrid mode:
> 
> - Specific livepatches can be marked as "non-replaceable" to ensure they
>   remain active and unaffected during replacements.
> 
> - Other livepatches can be marked as "replaceable," allowing targeted
>   replacements of only those patches.
> 
> This selective approach would reduce unnecessary transitions, lower the
> risk of temporary patch loss, and mitigate performance issues during
> livepatch replacement.
> 
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
>               klp_for_each_object(old_patch, old_obj) {
>                       int err;
>  
> +                     if (!old_patch->replaceable)
> +                             continue;

This is one example where things might get very complicated.

The same function might be livepatched by more livepatches, see
ops->func_stack. For example, let's have funcA and three livepatches:
a
  + lp1:
        .replace = false,
        .non-replace = true,
        .func = {
                        .old_name = "funcA",
                        .new_func = lp1_funcA,
                }, { }

  + lp2:
        .replace = false,
        .non-replace = false,
        .func = {
                        .old_name = "funcA",
                        .new_func = lp2_funcA,
                },{
                        .old_name = "funcB",
                        .new_func = lp2_funcB,
                }, { }


  + lp3:
        .replace = true,
        .non-replace = false,
        .func = {
                        .old_name = "funcB",
                        .new_func = lp3_funcB,
                }, { }


Now, apply lp1:

      + funcA() gets redirected to lp1_funcA()

Then, apply lp2

      + funcA() gets redirected to lp2_funcA()

Finally, apply lp3:

      + The proposed code would add "nop()" for
        funcA() because it exists in lp2 and does not exist in lp3.

      + funcA() will get redirected to the original code
        because of the nop() during transition

      + nop() will get removed in klp_complete_transition() and
        funcA() will get suddenly redirected to lp1_funcA()
        because it will still be on ops->func_stack even
        after the "nop" and lp2_funcA() gets removed.

           => The entire system will start using another funcA
              implementation at some random time

           => this would violate the consistency model


The proper solution might be tricky:

1. We would need to detect this situation and do _not_ add
   the "nop" for lp3 and funcA().

2. We would need a more complicate code for handling the task states.

   klp_update_patch_state() sets task->patch_state using
   the global "klp_target_state". But in the above example,
   when enabling lp3:

    + funcA would need to get transitioned _backward_:
         KLP_TRANSITION_PATCHED -> KLP_TRANSITION_UNPATCHED
      , so that it goes on ops->func_stack:
         lp2_funcA -> lp1->funcA

   while:

    + funcA would need to get transitioned forward:
         KLP_TRANSITION_UNPATCHED -> KLP_TRANSITION_PATCHED
      , so that it goes on ops->func_stack:
         lp2_funcB -> lp3->funcB


=> the hybrid mode would complicate the life for both livepatch
   creators/maintainers and kernel code developers/maintainers.

   I am afraid that this complexity is not acceptable if there are
   better solutions for the original problem.

>                       err = klp_add_object_nops(patch, old_obj);
>                       if (err)
>                               return err;

I am sorry but I am quite strongly against this approach!

Best Regards,
Petr

Reply via email to