Re: [PATCH v5 0/3] livepatch: introduce atomic replace
On 30.01.2018 17:03, Petr Mladek wrote: On Fri 2018-01-26 14:29:36, Evgenii Shatokhin wrote: On 26.01.2018 13:23, Petr Mladek wrote: On Fri 2018-01-19 16:10:42, Jason Baron wrote: On 01/19/2018 02:20 PM, Evgenii Shatokhin wrote: On 12.01.2018 22:55, Jason Baron wrote: There is one more thing that might need attention here. In my experiments with this patch series, I saw that unpatch callbacks are not called for the older binary patch (the one being replaced). So I think the pre_unpatch() can be called for any prior livepatch modules from __klp_enable_patch(). Perhaps in reverse order of loading (if there is more than one), and *before* the pre_patch() for the livepatch module being loaded. Then, if it sucessfully patches in klp_complete_transition() the post_unpatch() can be called for any prior livepatch modules as well. I think again it makes sense to call the post_unpatch() for prior modules *before* the post_patch() for the current livepatch modules. So, we are talking about a lot of rather non-trivial code. IMHO, it might be easier to run just the callbacks from the new patch. In reality, the author should always know what it might be replacing and what needs to be done. By other words, it might be much easier to handle all situations in a single script in the new patch. Alternative would be doing crazy hacks to prevent the older scripts from destroying what we would like to keep. We would need to keep in mind interactions between the scripts and the order in which they are called. Or do you plan to use cumulative patches to simply get rid of any other "random" livepatches with something completely different? In this case, it might be much more safe to disable the older patches a normal way. In my experience, it was quite convenient sometimes to just "replace all binary patches the user currently has loaded with this single one". No matter what these original binary patches did and where they came from. To be honest, I would feel better if the livepatch framework is more safe. It should prevent breaking the system by a patch that atomically replaces other random patches that rely on callbacks. Well, combining random livepatches from random vendors is a call for troubles. It might easily fail when two patches add different changes to the same function. I wonder if we should introduce some tags, keys, vendors. I mean to define an identification or dependencies that would say that some patches are compatible or incompatible. We could allow to livepatch one function by two livepatches only if they are from the same vendor. But then the order still might be important. Also I am not sure if this condition is safe enough. One the other hand, we could handle callbacks like the shadow variables. Every shadow variable has an ID. We have an API to add/read/update/remove them. We might allow to have more callbacks with different IDs. Then we could run callbacks only for IDs that are newly added or removed. Sigh, this would be very complex implementation as well. But it might make these features easier to use and maintain. Alternatively, I thought about having two modes. One is stack of "random" patches. Second is a cumulative mode. IMHO, the combination of the two modes makes things very complex. It might be much easier if we allow to load patch with the replace flag enabled only on top of another patch with this flag enabled. Another problematic situation is when you need to actually downgrade a cumulative patch. Should be rare, but... Good point. Well, especially the callbacks should be rare. Yes, we will probably use them for the most important fixes only and only if there are no other suitable options. The patches with could be more difficult to maintain anyway. I would like to hear from people that have some experience or plans with using callbacks and cumulative patches. I wonder how they plan to technically reuse a feature in the cummulative patches. IMHO, it should be very rare to add/remove callbacks. Then it might be safe to downgrade a cummulative patch if the callbacks are exactly the same. Well, I think we will disable the old patches explicitly in these cases, before loading of the new one. May be fragile but easier to maintain. I am afraid that we will not be able to support all use cases and keep the code sane. Thats is OK. I agree with you that the current behaviour of the 'replace' operation w.r.t. patch callbacks should stay as is. Making kernel code more complex than it should be is definitely a bad thing. I cannot say much about generic policies for cumulative/non-cumulative patches here. In our particular case (Virtuozzo ReadyKernel), the cumulative patches are easily recognizable by their names, they have 'cumulative' substring there. Patch version is also included in the name and is easily obtainable. So I think, I'll update our userspace tools (based on kpatch) so that th
Re: [PATCH v5 0/3] livepatch: introduce atomic replace
On 30.01.2018 22:24, Joe Lawrence wrote: On 01/30/2018 01:19 PM, Jason Baron wrote: [ ... snip ... ] Our main interest in 'atomic replace' is simply to make sure that cumulative patches work as expected in that they 'replace' any prior patches. We have an interest primarily in being able to apply patches from the stable trees via livepatch. I think the current situation with respect to 'replace' and callbacks is fine for us as well, but could change based on more experience with livepatch. Well the callback implementation was based somewhat on theoretical usage.. it was inspired by the kpatch macros that you talk about below, in which we had a few specific use-cases. Converting (un)patch notifiers to the livepatch model presented additional callback locations, and as such we ended up with pre-patch, post-patch, pre-unpatch and post-unpatch callbacks. Hopefully we'll get a better idea of their worth as we gain experience using them. At this point in time I would suggest keeping it as simple and safe as possible. :) As an aside I was just wondering how you are making use of the callbacks using the tool you mentioned (that is based on kpatch)? I see in the upstream kpatch that there are hooks such as: 'KPATCH_LOAD_HOOK(fn)' and 'KPATCH_UNLOAD_HOOK(fn)'. However, these are specific to 'kpatch' (as opposed to livepatch), and I do not see these sort of macros for the recently introduced livepatch callbacks. It seems it would be easy enough to add similar hooks for the livepatch callbacks. I was thinking of writing such a patch, but was wondering if there was an existing solution? I've got a work in progress PR for this very feature: https://github.com/dynup/kpatch/pull/780 Yes, exactly, I use that. It is an important piece of the puzzle. Evgenii has already provided some helpful feedback. Another set of eyes/testing would be appreciated! Regards, -- Joe .
Re: PATCH v6 6/6] livepatch: Add atomic replace
On 01.02.2018 00:55, Josh Poimboeuf wrote: On Fri, Jan 26, 2018 at 01:33:04PM +0300, Evgenii Shatokhin wrote: + The callbacks from the replaced patches are not called. It would be pretty hard to define a reasonable semantic and implement it. At least, it surely simplifies error handling, if these callbacks are not called. Anyway, I guess, this restriction should be mentioned explicitly in the docs. I think this is not obvious for the patch developers (esp. those familiar with RPM spec files and such ;-) ). What concerns me is that downgrading of the cumulative patches with callbacks becomes much more difficult this way. I mean, suppose a user has v1 of a cumulative patch installed. Then a newer version, v2, is released. They install it and find that it is buggy (very unfortunate but might still happen). Now they cannot atomically replace v2 back with v1, because the callbacks from v1 cannot clean up after v2. It will be needed to unload v2 explicitly and then load v1 back, which is more fragile. The loading failures are much more unlikely with livepatch than with the old kpatch, but they are still possible. I have no good solution to this though. I think the solution is to build a v3, which is basically identical to v1, except it also has callbacks for cleaning up after v2, if necessary. It should also be smart enough to deal with the case that v2 was not installed beforehand. I thought about this, but such patches would be very difficult to maintain. It seems better to explicitly unload v2 and then load v1 back in such cases. In addition, releasing v3 takes some time (build + appropriate QA procedures) while the users may want to do something about the faulty patch "right here, right now". This is what "downgrade" option is for. Anyway, livepatch is much more likely to apply the patches successfully than the old kpatch core. So it should be OK to simply unload v2 and then load v1 in such (hopefully rare) scenarios. As I said, the current behaviour is acceptable, esp. when it is documented. Implementation of livepatch is already complex enough. Regards, Evgenii
Re: [PATCH v5 0/3] livepatch: introduce atomic replace
On 12.01.2018 22:55, Jason Baron wrote: Hi, While using livepatch, I found that when doing cumulative patches, if a patched function is completed reverted by a subsequent patch (back to its original state) livepatch does not revert the funtion to its original state. Specifically, if patch A introduces a change to function 1, and patch B reverts the change to function 1 and introduces changes to say function 2 and 3 as well, the change that patch A introduced to function 1 is still present. This could be addressed by first completely removing patch A (disable and then rmmod) and then inserting patch B (insmod and enable), but this leaves an unpatched window. In discussing this issue with Josh on the kpatch mailing list, he mentioned that we could get 'atomic replace working properly', and that is the direction of this patchset: https://www.redhat.com/archives/kpatch/2017-June/msg5.html Thanks, -Jason Thanks a lot! Atomic replace is really crucial when using cumulative patches. There is one more thing that might need attention here. In my experiments with this patch series, I saw that unpatch callbacks are not called for the older binary patch (the one being replaced). That is, I have prepared 2 binary patches, each has all 4 patch/unpatch callbacks. When I load the first patch, its pre-patch and post-patch callbacks are called as expected. Then I replace it with the second patch. Replacement is successful, the pre-patch and post-patch callbacks are called for the second patch, However, pre-unpatch and post-unpatch callbacks do not run for the first one. This makes it more difficult to clean up what its pre/post-patch callbacks have done. It would be nice if pre-/post- unpatch callbacks were called for the first patch, perhaps, before/after the patch is actually disabled during replacement. I cannot see right now though, which way is the best to implement that. v4-v5 -re-base onto remove-immediate branch (removing immediate dependencies) -replaced modules can be re-enabled by doing rmmod and then insmod v3-v4: -add static patch, objects, funcs to linked lists to simplify iterator -break-out pure function movement as patch 2/3 v2-v3: -refactor how the dynamic nops are calculated (Petr Mladek) -move the creation of dynamic nops to enable/disable paths -add klp_replaced_patches list to indicate patches that can be re-enabled -dropped 'replaced' field -renamed dynamic fields in klp_func, object and patch -moved iterator implementation to kernel/livepatch/core.c -'inherit' nop immediate flag -update kobject_put free'ing logic (Petr Mladek) v1-v2: -removed the func_iter and obj_iter (Petr Mladek) -initialiing kobject structure for no_op functions using: klp_init_object() and klp_init_func() -added a 'replace' field to klp_patch, similar to the immediate field -a 'replace' patch now disables all previous patches -tried to shorten klp_init_patch_no_ops()... -Simplified logic klp_complete_transition (Petr Mladek) Jason Baron (3): livepatch: use lists to manage patches, objects and functions livepatch: shuffle core.c function order livepatch: add atomic replace include/linux/livepatch.h | 25 +- kernel/livepatch/core.c | 626 ++ kernel/livepatch/core.h | 6 + kernel/livepatch/patch.c | 22 +- kernel/livepatch/patch.h | 4 +- kernel/livepatch/transition.c | 49 +++- 6 files changed, 537 insertions(+), 195 deletions(-) Regards, Evgenii
Re: PATCH v6 6/6] livepatch: Add atomic replace
On 25.01.2018 19:02, Petr Mladek wrote: From: Jason Baron Sometimes we would like to revert a particular fix. Currently, this is not easy because we want to keep all other fixes active and we could revert only the last applied patch. One solution would be to apply new patch that implemented all the reverted functions like in the original code. It would work as expected but there will be unnecessary redirections. In addition, it would also require knowing which functions need to be reverted at build time. Another problem is when there are many patches that touch the same functions. There might be dependencies between patches that are not enforced on the kernel side. Also it might be pretty hard to actually prepare the patch and ensure compatibility with the other patches. A better solution would be to create cumulative patch and say that it replaces all older ones. This patch adds a new "replace" flag to struct klp_patch. When it is enabled, a set of 'nop' klp_func will be dynamically created for all functions that are already being patched but that will not longer be modified by the new patch. They are temporarily used as a new target during the patch transition. There are used several simplifications: + nops' structures are generated already when the patch is registered. All registered patches are taken into account, even the disabled ones. As a result, there might be more nops than are really needed when the patch is enabled and some disabled patches were removed before. But we are on the safe side and it simplifies the implementation. Especially we could reuse the existing init() functions. Also freeing is easier because the same nops are created and removed only once. Alternative solution would be to create nops when the patch is enabled. But then we would need to complicated to reuse the init() functions and error paths. It would increase the risk of errors because of late kobject initialization. It would need tricky waiting for freed kobjects when finalizing a reverted enable transaction. + The replaced patches are removed from the stack and cannot longer be enabled directly. Otherwise, we would need to implement a more complex logic of handling the stack of patches. It might be hard to come with a reasonable semantic. A fallback is to remove (rmmod) the replaced patches and register (insmod) them again. + Nops are handled like normal function patches. It reduces changes in the existing code. It would be possible to copy internal values when they are allocated and make short cuts in init() functions. It would be possible to use the fact that old_func and new_func point to the same function and do not init new_func and new_size at all. It would be possible to detect nop func in ftrace handler and just leave. But all these would just complicate the code and maintenance. + The callbacks from the replaced patches are not called. It would be pretty hard to define a reasonable semantic and implement it. At least, it surely simplifies error handling, if these callbacks are not called. Anyway, I guess, this restriction should be mentioned explicitly in the docs. I think this is not obvious for the patch developers (esp. those familiar with RPM spec files and such ;-) ). What concerns me is that downgrading of the cumulative patches with callbacks becomes much more difficult this way. I mean, suppose a user has v1 of a cumulative patch installed. Then a newer version, v2, is released. They install it and find that it is buggy (very unfortunate but might still happen). Now they cannot atomically replace v2 back with v1, because the callbacks from v1 cannot clean up after v2. It will be needed to unload v2 explicitly and then load v1 back, which is more fragile. The loading failures are much more unlikely with livepatch than with the old kpatch, but they are still possible. I have no good solution to this though. It might even be counter-productive. The new patch is cumulative. It is supposed to include most of the changes from older patches. In most cases, it will not want to call pre_unpatch() post_unpatch() callbacks from the replaced patches. It would disable/break things for no good reasons. Also it should be easier to handle various scenarios in a single script in the new patch than think about interactions caused by running many scripts from older patches. No to say that the old scripts even would not expect to be called in this situation. Signed-off-by: Jason Baron [pmla...@suse.com: Split, reuse existing code, simplified] Signed-off-by: Petr Mladek Cc: Josh Poimboeuf Cc: Jessica Yu Cc: Jiri Kosina Cc: Miroslav Benes Cc: Petr Mladek --- include/linux/livepatch.h | 3 + kernel/livepatch/core.c | 162 +- kernel/livepatch/core.h
Re: [PATCH v5 0/3] livepatch: introduce atomic replace
On 26.01.2018 13:23, Petr Mladek wrote: On Fri 2018-01-19 16:10:42, Jason Baron wrote: On 01/19/2018 02:20 PM, Evgenii Shatokhin wrote: On 12.01.2018 22:55, Jason Baron wrote: There is one more thing that might need attention here. In my experiments with this patch series, I saw that unpatch callbacks are not called for the older binary patch (the one being replaced). So I think the pre_unpatch() can be called for any prior livepatch modules from __klp_enable_patch(). Perhaps in reverse order of loading (if there is more than one), and *before* the pre_patch() for the livepatch module being loaded. Then, if it sucessfully patches in klp_complete_transition() the post_unpatch() can be called for any prior livepatch modules as well. I think again it makes sense to call the post_unpatch() for prior modules *before* the post_patch() for the current livepatch modules. I played with this when working on v6. And I am not sure if it is worth it. The main reason is that we are talking about cumulative patches. They are supposed to preserve most of the existing changes and just remove and/or add few changes. The older patches might or might not expect to be replaced this way. If we would decide to run callbacks from the replaced patches then it would make sense to run the one from the new patch first. It is because we might need to do some hacks to preserve the already existing changes. We might need something like this for __klp_enable_patch(): static int klp_run_pre_patch_callbacks(struct klp_patch *patch) { struct klp_patch *old_patch; struct klp_object *old_obj; int ret; list_for_each_entry_reverse(old_patch, &klp_patches, list) { if (!old_patch->enabled && old_patch != patch) continue; klp_for_each_object(old_patch, old_obj) { if (!klp_is_object_loaded()) continue; if (old_patch == patch) { /* pre_patch from new patch */ ret = klp_pre_patch_callback(obj); if (ret) return ret; if (!patch->replace) return; } else { /* preunpatch from replaced patches */ klp_pre_unpatch_callback(obj); } } } return 0; } This was quite hairy. Alternative would be: static void klp_run_pre_unpatch_callbacks_when_replacing(struct klp_patch *patch) { struct klp_patch *old_patch; struct klp_object *old_obj; if (WARN_ON(!patch->replace)) return; list_for_each_entry_reverse(old_patch, &klp_patches, list) { if (!old_patch->enabled || old_patch == patch) continue; klp_for_each_object(old_patch, old_obj) { if (!klp_is_object_loaded()) continue; klp_pre_unpatch_callback(obj); } } } static int klp_run_pre_patch_callbacks(struct klp_patch *patch) { struct klp_object *old_obj; int ret; klp_for_each_object(patch, old_obj) { if (!klp_is_object_loaded()) continue; ret = klp_pre_patch_callback(obj); if (ret) return ret; } if (patch->replace) klp_run_pre_unpatch_callbacks_when_replacing(patch); return 0; } 2nd variant is easier to read but a lot of code. And this is only what we would need for __klp_enable_patch(). But we would need solution also for: klp_cancel_transition(); klp_try_transition(); (2 variants for patching and unpatching) klp_module_coming(); klp_module_going(); So, we are talking about a lot of rather non-trivial code. IMHO, it might be easier to run just the callbacks from the new patch. In reality, the author should always know what it might be replacing and what needs to be done. By other words, it might be much easier to handle all situations in a single script in the new patch. Alternative would be doing crazy hacks to prevent the older scripts from destroying what we would like to keep. We would need to keep in mind interactions between the scripts and the order in which they are called. Or do you plan to use cumulative patches to simply get rid of any other "random" livepatches with something completely different? In this case, it might be much more safe to disable the older patches a normal way. In my experience, it was quite convenient sometimes to just "replace all binary patches the user currently has loaded with this single one". No matt
Re: [PATCH v10 05/10] livepatch: Support separate list for replaced patches.
On 20.03.2018 15:25, Petr Mladek wrote: On Mon 2018-03-19 16:43:24, Josh Poimboeuf wrote: On Mon, Mar 19, 2018 at 04:02:07PM +0100, Petr Mladek wrote: Can someone remind me why we're permanently disabling replaced patches? I seem to remember being involved in that decision, but at least with this latest version of the patches, it seems like it would be simpler to just let 'replace' patches be rolled back to the previous state when they're unpatched. Then we don't need two lists of patches, the nops can become more permanent, the replaced patches remain "enabled" but inert, and the unpatching behavior is less surprising to the user, more like a normal rollback. Yes, keeping the patches might make some things easier. But it might also bring some problems and it would make the feature less useful. The following arguments come to my mind: 1. The feature should help to keep the system in a consistent and well defined state. It should not depend on what patches were installed before. But the nops already accomplish that. If they didn't, then this patch set has a major problem. The nops are enough to keep the functionality but they might harm the performance. Livepatching is about preventing bugs without reboot. I could simply imagine that ftrace on a hot patch might cause performance problems on some workloads. And I would like to have a way out in this case. Anyway, I am reworking the patchset so that it implements your approach first. The possibility to remove NOPs and replaced livepatches is done via a followup patch. This might help to discuss if the changes are worth it or not. 2. The feature should allow to unpatch some functions while keeping the others patched. The ftrace handler might cause some unwanted slowdown or other problems. The performance might get restored only when we remove the NOPs when they are not longer necessary. I'd say simplicity and maintainability of the code is more important than an (imagined) performance issue. The NOPs should be pretty fast anyway. Not to mention that my proposal would make the behavior less surprising and more user friendly (reverting a 'replace' patch restores it to its previous state). If the "disable" way works as expected, see below. Also it is less surprising only if people understand the stack of patches. If they are familiar only with replace patches then it is normal for them that the patches get replaced. It is then like a package version update. 3. The handling of callbacks is already problematic. We run only the ones from the last patch to make things easier. We would need to come with something more complicated if we want to support rollback to "random" patches on the stack. And support for random patches is fundamental at least from my point of view. Can you elaborate on what you mean by random patches and why it would require something more complicated from the callbacks? Let's say that we will use atomic replace for cumulative patches. Then every new patch knows what earlier patches did. It just did not know which of them was already installed. Therefore it needs to detect what callbacks were already called. The callbacks usually create or change something. So there should be something to check. Therefore the way forward should be rather straightforward. The way back is more problematic. The callbacks in the new cumulative patch would need to store information about the previous state and be able to restore it when the patch gets disabled. It might more or less double the callbacks code and testing scenarios. Along those lines, I'd also propose that we constrain our existing patch stacking even further. Right now we allow a new patch to be registered on top of a disabled patch, though we don't allow the new patch to be enabled until the previous patch gets enabled. I'd propose we no longer allow that condition. We should instead enforce that all existing patches are *enabled* before allowing a new patch to be registered on top. That way the patch stacking is even more sane, and there are less "unusual" conditions to worry about. We have enough of those already. Each additional bit of flexibility has a maintenance cost, and this one isn't worth it IMO. Again, this might make some things easier but it might also bring problems. For example, we would need to solve the situation when the last patch is disabled and cannot be removed because the transition was forced. This might be more common after removing the immediate feature. I would stop worrying about forced patches so much :-) I have already seen blocked transition several times. It is true that it was with kGraft. But we just do not have enough real life experience with the upstream livepatch code. Forced patches already come with a disclaimer, and we can't bend over backwards for them. In such a rare case, the admin can just re-enable the forced patch before loading the 'replace' patch. Also
Re: [PATCH v10 05/10] livepatch: Support separate list for replaced patches.
On 10.04.2018 16:21, Miroslav Benes wrote: I think you're missing my point. This isn't about your patch set, per se. It's really about our existing code. Today, our patch stack doesn't follow real stack semantics, because patches in the middle might be disabled. I see that as a problem. No, please read it again. I wasn't talking about replaced patches. I was confused by wording "in the middle". It suggested that there might had been enabled patches on the top and the bottom of the stack and some disabled patches in between at the same time (or vice versa). This was not true. Do I understand it correctly that you do not like that the patches on the stack might be in two states (disabled/enabled). This might be translated that you do not like the state when the patch is registered and disabled. I wonder if the problem is in the "stack" abstraction. Would it help if we call it "sorted list" or whatever more suitable? Another possibility would be to get rid of the enable/disable states. I mean that the patch will be automatically enabled during registration and removed during unregistration. Or we could rename the operation do add/remove or anything else. In fact, this is how it worked in kGraft. I've already wondered couple of times why we had separate enable/disable. If there is someone who knows, remind me, please. I wouldn't be against a simplification here. On the other hand, it is kind of nice to keep the registration and enablement separate. It is more flexible if someone needs it. Anyway, we should solve it together with the stacking. It is tightly connected. AFAIK, the enable/disabled state made more sense for immediate patches that could not be unloaded. We still need to keep the patches when the transaction is forced. The question is if we need to keep the sysfs entries for loaded but unused patches. If 'replace' were the only mode, then we wouldn't even need a patch stack because it wouldn't really matter much whether the previous patch is enabled or disabled. I think this is in agreement with the point you're making. But we still support non-replace patches. My feeling is that we should either do a true stack, or no stack at all. The in-between thing is going to be confusing, not only for us, but for patch authors and end users. I see it like two different modes. We either have a stack of patches that depend on each other. But if they depend on each other, they can use 'replace' and a stack isn't needed. Yes but see below. And If they *don't* depend on each other, then the stack is overly restrictive, for no good reason. Either way, why do we still need a stack? Good question. I suggest to agree on some terms first: + Independent patches make unrelated changes. Any new function must not rely on changes done by any other patch. + Dependent patches mean that a later patch depend on changes done by an earlier patch. For example, a new function might use function added by an earlier patch. + Each cumulative patch include all necessary changes. I would say that it is self-containing and independent. Except that they should be able to continue using changes made by earlier patches (shadow variables, callbacks). Then we could say: + The stack helps to enforce dependencies between dependent patches. But there is needed also some external solution that forces loading the patches in the right order. + The "replace" feature is useful for cumulative patches. It allows to remove existing changes and remove unused stuff. But cumulative patches might be done even _without_ the atomic replace. + Cumulative patches _with_ "replace" flag might be in theory installed in random order because they handle not-longer patched functions. Well, some incompatibility might be caused by shadow variables and callbacks. Therefore it still might make sense to install them in the right order. Cumulative patches _without_ "replace" flag must be installed in the right order because they do not handle not-longer patched functions. Anyway, for cumulative patches is important the external ordering (loading modules) and not the stack. Back to your question: The stack is needed for dependent non-cumulative patches. The cumulative patches with "replace" flag seems the be the most safe and most flexible solution. The question is if we want to force all users to use this mode. Or we have replace patches that are standalone and we allow a smooth transfer from one to another one. Anyway, for us, it is much more important the removal of replaced patches. We could probably live without the possibility to replace disabled patches. I think replacing disabled patches is ok, *if* we get rid of the illusion of a stack. The current stack isn't really a stack, it's just awkward. I personally do not have problems with it. As I said, I see this a
Re: [PATCH v10 00/10] livepatch: Atomic replace feature
Hi, On 07.03.2018 11:20, Petr Mladek wrote: The atomic replace allows to create cumulative patches. They are useful when you maintain many livepatches and want to remove one that is lower on the stack. In addition it is very useful when more patches touch the same function and there are dependencies between them. Could you tell me what is the current status of this series? Looks like I lost track of it. Is there anything else that should be done here before this series could be merged? Changes against v9: + Fixed check of valid NOPs for already loaded objects, regression introduced in v9 [Joe, Mirek] + Allow to replace even disabled patches [Evgenii] Changes against v8: + Fixed handling of statically defined struct klp_object with empty array of functions [Joe, Mirek] + Removed redundant func->new_func assignment for NOPs [Mirek] + Improved some wording [Mirek] Changes against v7: + Fixed handling of NOPs for not-yet-loaded modules + Made klp_replaced_patches list static [Mirek] + Made klp_free_object() public later [Mirek] + Fixed several reported typos [Mirek, Joe] + Updated documentation according to the feedback [Joe] + Added some Acks [Mirek] Changes against v6: + used list_move when disabling replaced patches [Jason] + renamed KLP_FUNC_ORIGINAL -> KLP_FUNC_STATIC [Mirek] + used klp_is_func_type() in klp_unpatch_object() [Mirek] + moved static definition of klp_get_or_add_object() [Mirek] + updated comment about synchronization in forced mode [Mirek] + added user documentation + fixed several typos Jason Baron (5): livepatch: Use lists to manage patches, objects and functions livepatch: Initial support for dynamic structures livepatch: Allow to unpatch only functions of the given type livepatch: Support separate list for replaced patches. livepatch: Add atomic replace Petr Mladek (5): livepatch: Free only structures with initialized kobject livepatch: Correctly handle atomic replace for not yet loaded modules livepatch: Improve dynamic struct klp_object detection and manipulation livepatch: Allow to replace even disabled patches livepatch: Atomic replace and cumulative patches documentation Documentation/livepatch/cumulative-patches.txt | 83 + include/linux/livepatch.h | 65 +++- kernel/livepatch/core.c| 422 ++--- kernel/livepatch/core.h| 4 + kernel/livepatch/patch.c | 31 +- kernel/livepatch/patch.h | 4 +- kernel/livepatch/transition.c | 41 ++- 7 files changed, 598 insertions(+), 52 deletions(-) create mode 100644 Documentation/livepatch/cumulative-patches.txt Regards, Evgenii
Re: [PATCH v10 00/10] livepatch: Atomic replace feature
On 17.08.2018 17:53, Petr Mladek wrote: On Fri 2018-08-17 13:17:27, Evgenii Shatokhin wrote: Hi, On 07.03.2018 11:20, Petr Mladek wrote: The atomic replace allows to create cumulative patches. They are useful when you maintain many livepatches and want to remove one that is lower on the stack. In addition it is very useful when more patches touch the same function and there are dependencies between them. Could you tell me what is the current status of this series? Looks like I lost track of it. Is there anything else that should be done here before this series could be merged? I have almost ready the rework suggested by Josh. I still need to port the new tests and send. Unfortunately I had to put it temporary on hold because of other important tasks. I am about to leave for vacation (one week off). I will try to find some cycles, finish and send it once I am back. That would be great! Thanks. Best Regards, Petr .
Re: [DMARC Error] Re: [RFC PATCH] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
Hi, On 07.03.2024 03:17, Puranjay Mohan wrote: Hi Alex, On Wed, Mar 6, 2024 at 9:35 PM Alexandre Ghiti wrote: Hi Puranjay, On 06/03/2024 17:59, Puranjay Mohan wrote: This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V. This allows each ftrace callsite to provide an ftrace_ops to the common ftrace trampoline, allowing each callsite to invoke distinct tracer functions without the need to fall back to list processing or to allocate custom trampolines for each callsite. This significantly speeds up cases where multiple distinct trace functions are used and callsites are mostly traced by a single tracer. The idea and most of the implementation is taken from the ARM64's implementation of the same feature. The idea is to place a pointer to the ftrace_ops as a literal at a fixed offset from the function entry point, which can be recovered by the common ftrace trampoline. We use -fpatchable-function-entry to reserve 8 bytes above the function entry by emitting 2 4 byte or 4 2 byte nops depending on the presence of CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer to the associated ftrace_ops for that callsite. Functions are aligned to 8 bytes to make sure that the accesses to this literal are atomic. This approach allows for directly invoking ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or part of a module), without going via ftrace_ops_list_func. I've benchamrked this with the ftrace_ops sample module on Qemu, with the next version, I will provide benchmarks on real hardware: Without this patch: +---+-++ | Number of tracers| Total time (ns) | Per-call average time | |---+-+| | Relevant | Irrelevant |10 calls | Total (ns) | Overhead (ns) | |--++-++---| |0 | 0 |15615700 |156 | - | |0 | 1 |15917600 |159 | - | |0 | 2 |15668000 |156 | - | |0 | 10 |14971500 |149 | - | |0 |100 |15417600 |154 | - | |0 |200 |15387000 |153 | - | |--++-++---| |1 | 0 | 119906800 | 1199 | 1043 | |1 | 1 | 137428600 | 1374 | 1218 | |1 | 2 | 159562400 | 1374 | 1218 | |1 | 10 | 302099900 | 3020 | 2864 | |1 |100 | 2008785500 | 20087 | 19931 | |1 |200 | 3965221900 | 39652 | 39496 | |--++-++---| |1 | 0 | 119166700 | 1191 | 1035 | |2 | 0 | 15700 | 1579 | 1423 | | 10 | 0 | 425370100 | 4253 | 4097 | | 100 | 0 | 3595252100 | 35952 | 35796 | | 200 | 0 | 7023485700 | 70234 | 70078 | +--++-++---+ Note: per-call overhead is estimated relative to the baseline case with 0 relevant tracers and 0 irrelevant tracers. With this patch: +---+-++ | Number of tracers | Total time (ns) | Per-call average time | |---+-+| | Relevant | Irrelevant |10 calls | Total (ns) | Overhead (ns) | |--++-++---| |0 | 0 |15254600 |152 | - | |0 | 1 |16136700 |161 | - | |0 | 2 |15329500 |153 | - | |0 | 10 |15148800 |151 | - | |0 |100 |15746900 |157 | - | |0 |200 |15737400 |157 | - | |--++-++---| |1 | 0 |47909000 |479 | 327 | |1 | 1 |48297400 |482 | 330 | |1 | 2 |47314100 |473 | 321 | |1 | 10 |47844900 |478 | 326 | |1 |100 |46591900 |465 | 313 | |1 |200 |47178900 |471 | 319 | |--++-++---| |1 | 0 |46715800 |467
Re: [PATCH] perf-probe: dso: Add symbols in .text.* subsections to text symbol map in kenrel modules
On 23.02.2021 10:37, Masami Hiramatsu wrote: The kernel modules have .text.* subsections such as .text.unlikely. Since dso__process_kernel_symbol() only identify the symbols in the ".text" section as the text symbols and inserts it in the default dso in the map, the symbols in such subsections can not be found by map__find_symbol(). This adds the symbols in those subsections to the default dso in the map so that map__find_symbol() can find them. This solves the perf-probe issue on probing online module. Without this fix, probing on a symbol in .text.unlikely fails. # perf probe -m nf_conntrack nf_l4proto_log_invalid Probe point 'nf_l4proto_log_invalid' not found. Error: Failed to add events. With this fix, it works because map__find_symbol() can find the symbol correctly. # perf probe -m nf_conntrack nf_l4proto_log_invalid Added new event: probe:nf_l4proto_log_invalid (on nf_l4proto_log_invalid in nf_conntrack) You can now use it in all perf tools, such as: perf record -e probe:nf_l4proto_log_invalid -aR sleep 1 Reported-by: Evgenii Shatokhin Signed-off-by: Masami Hiramatsu Thanks for the fix! It looks like it helps, at least with nf_conntrack in kernel 5.11: - # ./perf probe -v -m nf_conntrack nf_ct_resolve_clash probe-definition(0): nf_ct_resolve_clash symbol:nf_ct_resolve_clash file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Failed to get build-id from nf_conntrack. Cache open error: -1 Open Debuginfo file: /lib/modules/5.11.0-test01/kernel/net/netfilter/nf_conntrack.ko Try to find probe point from debuginfo. Matched function: nf_ct_resolve_clash [33616] Probe point found: nf_ct_resolve_clash+0 Found 1 probe_trace_events. Opening /sys/kernel/tracing//kprobe_events write=1 Opening /sys/kernel/tracing//README write=0 Writing event: p:probe/nf_ct_resolve_clash nf_conntrack:nf_ct_resolve_clash+0 Added new event: probe:nf_ct_resolve_clash (on nf_ct_resolve_clash in nf_conntrack) You can now use it in all perf tools, such as: perf record -e probe:nf_ct_resolve_clash -aR sleep 1 - I guess, the patch is suitable for stable kernel branches as well. Without the patch, the workaround you suggested earlier (using the full path to nf_conntrack.ko) works too. --- tools/perf/util/symbol-elf.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c index 6dff843fd883..0c1113236913 100644 --- a/tools/perf/util/symbol-elf.c +++ b/tools/perf/util/symbol-elf.c @@ -985,7 +985,9 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map, if (strcmp(section_name, (curr_dso->short_name + dso->short_name_len)) == 0) return 0; - if (strcmp(section_name, ".text") == 0) { + /* .text and .text.* are included in the text dso */ + if (strncmp(section_name, ".text", 5) == 0 && + (section_name[5] == '\0' || section_name[5] == '.')) { /* * The initial kernel mapping is based on * kallsyms and identity maps. Overwrite it to . Regards, Evgenii
Re: [PATCH] perf-probe: dso: Add symbols in .text.* subsections to text symbol map in kenrel modules
On 23.02.2021 23:11, Arnaldo Carvalho de Melo wrote: Em Tue, Feb 23, 2021 at 06:02:58PM +0300, Evgenii Shatokhin escreveu: On 23.02.2021 10:37, Masami Hiramatsu wrote: The kernel modules have .text.* subsections such as .text.unlikely. Since dso__process_kernel_symbol() only identify the symbols in the ".text" section as the text symbols and inserts it in the default dso in the map, the symbols in such subsections can not be found by map__find_symbol(). This adds the symbols in those subsections to the default dso in the map so that map__find_symbol() can find them. This solves the perf-probe issue on probing online module. Without this fix, probing on a symbol in .text.unlikely fails. # perf probe -m nf_conntrack nf_l4proto_log_invalid Probe point 'nf_l4proto_log_invalid' not found. Error: Failed to add events. With this fix, it works because map__find_symbol() can find the symbol correctly. # perf probe -m nf_conntrack nf_l4proto_log_invalid Added new event: probe:nf_l4proto_log_invalid (on nf_l4proto_log_invalid in nf_conntrack) You can now use it in all perf tools, such as: perf record -e probe:nf_l4proto_log_invalid -aR sleep 1 Reported-by: Evgenii Shatokhin Signed-off-by: Masami Hiramatsu Thanks for the fix! It looks like it helps, at least with nf_conntrack in kernel 5.11: So I'm taking this as you providing a: Tested-by: Evgenii Shatokhin ok? Sure, thanks! - Arnaldo - # ./perf probe -v -m nf_conntrack nf_ct_resolve_clash probe-definition(0): nf_ct_resolve_clash symbol:nf_ct_resolve_clash file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Failed to get build-id from nf_conntrack. Cache open error: -1 Open Debuginfo file: /lib/modules/5.11.0-test01/kernel/net/netfilter/nf_conntrack.ko Try to find probe point from debuginfo. Matched function: nf_ct_resolve_clash [33616] Probe point found: nf_ct_resolve_clash+0 Found 1 probe_trace_events. Opening /sys/kernel/tracing//kprobe_events write=1 Opening /sys/kernel/tracing//README write=0 Writing event: p:probe/nf_ct_resolve_clash nf_conntrack:nf_ct_resolve_clash+0 Added new event: probe:nf_ct_resolve_clash (on nf_ct_resolve_clash in nf_conntrack) You can now use it in all perf tools, such as: perf record -e probe:nf_ct_resolve_clash -aR sleep 1 - I guess, the patch is suitable for stable kernel branches as well. Without the patch, the workaround you suggested earlier (using the full path to nf_conntrack.ko) works too. --- tools/perf/util/symbol-elf.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c index 6dff843fd883..0c1113236913 100644 --- a/tools/perf/util/symbol-elf.c +++ b/tools/perf/util/symbol-elf.c @@ -985,7 +985,9 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map, if (strcmp(section_name, (curr_dso->short_name + dso->short_name_len)) == 0) return 0; - if (strcmp(section_name, ".text") == 0) { + /* .text and .text.* are included in the text dso */ + if (strncmp(section_name, ".text", 5) == 0 && + (section_name[5] == '\0' || section_name[5] == '.')) { /* * The initial kernel mapping is based on * kallsyms and identity maps. Overwrite it to . Regards, Evgenii
'perf probe' and symbols from .text.
Hi, It seems, 'perf probe' can only see functions from .text section in the kernel modules, but not from .text.unlikely or other .text.* sections. For example, with kernel 5.11 and nf_conntrack.ko with debug info, 'perf probe' succeeds for nf_conntrack_attach() from .text and fails for nf_ct_resolve_clash() from .text.unlikely: # perf probe -v -m nf_conntrack nf_ct_resolve_clash probe-definition(0): nf_ct_resolve_clash symbol:nf_ct_resolve_clash file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Failed to get build-id from nf_conntrack. Cache open error: -1 Open Debuginfo file: /lib/modules/5.11.0-test01/kernel/net/netfilter/nf_conntrack.ko Try to find probe point from debuginfo. Matched function: nf_ct_resolve_clash [33616] Probe point found: nf_ct_resolve_clash+0 Found 1 probe_trace_events. Post processing failed or all events are skipped. (-2) Probe point 'nf_ct_resolve_clash' not found. Error: Failed to add events. Reason: No such file or directory (Code: -2) # perf probe -v -m nf_conntrack nf_conntrack_attach probe-definition(0): nf_conntrack_attach symbol:nf_conntrack_attach file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Failed to get build-id from nf_conntrack. Cache open error: -1 Open Debuginfo file: /lib/modules/5.11.0-test01/kernel/net/netfilter/nf_conntrack.ko Try to find probe point from debuginfo. Matched function: nf_conntrack_attach [2c8c3] Probe point found: nf_conntrack_attach+0 Found 1 probe_trace_events. Opening /sys/kernel/tracing//kprobe_events write=1 Opening /sys/kernel/tracing//README write=0 Writing event: p:probe/nf_conntrack_attach nf_conntrack:nf_conntrack_attach+0 Added new event: probe:nf_conntrack_attach (on nf_conntrack_attach in nf_conntrack) Is there a way to allow probing of functions in .text. ? Of course, one could place probes using absolute addresses of the functions but that would be less convenient. This also affects many livepatch modules where the kernel code can be compiled with -ffunction-sections and each function may end up in a separate section .text.. 'perf probe' cannot be used there, except with the absolute addresses. Moreover, if FGKASLR patches are merged (https://lwn.net/Articles/832434/) and the kernel is built with FGKASLR enabled, -ffunction-sections will be used too. 'perf probe' will be unable to see the kernel functions then. Regards, Evgenii
Re: [PATCH v4 00/10] Function Granular KASLR
Hi, On Fri, 17 Jul 2020, Kristen Carlson Accardi wrote: Function Granular Kernel Address Space Layout Randomization (fgkaslr) - This patch set is an implementation of finer grained kernel address space randomization. It rearranges your kernel code at load time on a per-function level granularity, with only around a second added to boot time. [...] Modules --- Modules are randomized similarly to the rest of the kernel by shuffling the sections at load time prior to moving them into memory. The module must also have been build with the -ffunction-sections compiler option. It seems, a couple more adjustments are needed in the module loader code. With function granular KASLR, modules will have lots of ELF sections due to -ffunction-sections. On my x86_64 system with kernel 5.8-rc7 with FG KASLR patches, for example, xfs.ko has 4849 ELF sections total, 2428 of these are loaded and shown in /sys/module/xfs/sections/. There are at least 2 places where high-order memory allocations might happen during module loading. Such allocations may fail if memory is fragmented, while physically contiguous memory areas are not really needed there. I suggest to switch to kvmalloc/kvfree there. 1. kernel/module.c, randomize_text(): Elf_Shdr **text_list; ... int max_sections = info->hdr->e_shnum; ... text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL); The size of the allocated memory area is (8 * total_number_of_sections), if I understand it right, which is 38792 for xfs.ko, a 4th order allocation. 2. kernel/module.c, mod_sysfs_setup() => add_sect_attrs(). This allocation can be larger than the first one. We found this issue with livepatch modules some time ago (these modules are already built with -ffunction-sections) [1], but, with FG KASLR, it affects all kernel modules. Large ones like xfs.ko, btrfs.ko, etc., could suffer the most from it. When a module is loaded sysfs attributes are created for its ELF sections (visible as /sys/module//sections/*). and contain the start addresses of these ELF sections. A single memory chunk is allocated for all these: size[0] = ALIGN(sizeof(*sect_attrs) + nloaded * sizeof(sect_attrs->attrs[0]), sizeof(sect_attrs->grp.attrs[0])); size[1] = (nloaded + 1) * sizeof(sect_attrs->grp.attrs[0]); sect_attrs = kzalloc(size[0] + size[1], GFP_KERNEL); 'nloaded' is the number of loaded ELF section in the module. For the kernel 5.8-rc7 on my system, the total size is 56 + 72 * nloaded, which is 174872 for xfs.ko, 43 pages, 6th order allocation. I enabled 'mm_page_alloc' tracepoint with filter 'order > 3' to confirm the issue and, indeed, got these two allocations when modprobe'ing xfs: /sys/kernel/debug/tracing/trace: modprobe-1509 <...>: mm_page_alloc: <...> order=4 migratetype=0 gfp_flags=GFP_KERNEL|__GFP_COMP modprobe-1509 => __alloc_pages_nodemask => alloc_pages_current => kmalloc_order => kmalloc_order_trace => __kmalloc => load_module modprobe-1509 <...>: mm_page_alloc: <...> order=6 migratetype=0 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO modprobe-1509 => __alloc_pages_nodemask => alloc_pages_current => kmalloc_order => kmalloc_order_trace => __kmalloc => mod_sysfs_setup => load_module I suppose, something like this can be used as workaround: * for randomize_text(): --- diff --git a/kernel/module.c b/kernel/module.c index 0f4f4e567a42..a2473db1d0a3 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -2433,7 +2433,7 @@ static void randomize_text(struct module *mod, struct load_info *info) if (sec == 0) return; - text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL); + text_list = kvmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL); if (!text_list) return; @@ -2466,7 +2466,7 @@ static void randomize_text(struct module *mod, struct load_info *info) shdr->sh_entsize = get_offset(mod, &size, shdr, 0); } - kfree(text_list); + kvfree(text_list); } /* Lay out the SHF_ALLOC sections in a way not dissimilar to how ld --- * for add_sect_attrs(): --- diff --git a/kernel/module.c b/kernel/module.c index 0f4f4e567a42..a2473db1d0a3 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -1541,7 +1541,7 @@ static void free_sect_attrs(struct module_sect_attrs *sect_attrs) for (section = 0; section < sect_attrs->nsections; section++) kfree(sect_attrs->attrs[section].battr.attr.na
hidepid=2 and dumpability
(Sorry, forgot to CC LKML yesterday, resending.) Hi, Could you shed some light on the implementation of 'hidepid' option for procfs in the Linux kernel? As far as I can see, has_pid_permissions() eventually calls ptrace_may_access(task, PTRACE_MODE_READ). This way, if hidepid=2 is used, the ordinary users will see only those of their own processes, which are dumpable. For example, the processes that changed credentials or were marked as non-dumpable with prctl() will remain invisible to their owners. Isn't that an overkill? Or perhaps, there is a security risk if a user could read the contents of /proc/ for these processes? I stumbled upon this while experimenting with hidepid=2 in a Virtuozzo container. If I login to the container as an ordinary user via SSH, one of the sshd processes (owned by the user) in the container is not visible to that user. I checked in runtime that it is the dumpability check in the kernel that fails in __ptrace_may_access(). The kernel is based on the version 3.10.x, but it should not matter much in this case. Any ideas? Thanks in advance. Regards, Evgenii -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kmemleak and debug objects
Hi, When using Kmemleak on the kernel 3.10.0-229.7.2 x86_64 (RHEL 7 and the like) with CONFIG_DEBUG_OBJECTS=y, I sometimes see Kmemleak report the potential leaks of the following kind: --- unreferenced object 0x8800270e32d0 (size 40): comm "updatedb", pid 14416, jiffies 4297905695 (age 4024.651s) hex dump (first 32 bytes): c0 2a 0e 27 00 88 ff ff f0 38 5c 84 ff ff ff ff .*.'.8\. 01 00 00 00 00 00 00 00 10 2f be 27 00 88 ff ff ./.' backtrace: [] kmemleak_alloc+0x4e/0xc0 [] kmem_cache_alloc+0x125/0x390 [] __debug_object_init+0x6bd/0xc40 [] debug_object_init+0x1b/0x20 [] __init_work+0x19/0x30 [] ext4_alloc_inode+0x5c6/0x7f0 [ext4] [] alloc_inode+0x5b/0x170 [] iget_locked+0xfb/0x2f0 [] ext4_iget+0x112/0x46f0 [ext4] [] ext4_iget_normal+0x88/0xb0 [ext4] [] ext4_lookup+0x29e/0x4e0 [ext4] [] lookup_real+0x90/0x120 [] __lookup_hash+0xe3/0x100 [] lookup_slow+0xd2/0x280 [] path_lookupat+0x1ec6/0x4610 [] filename_lookup+0x4c/0x1f0 --- If I trigger Kmemleak scan again, such objects are still reported in /sys/kernel/debug/kmemleak. The object is an instance of 'struct debug_obj' and the first two 8-byte fields of it are the contents of 'struct hlist_node' (next, *pprev). The object should be in a bucket of obj_hash (lib/debugobjects.c), namely the one at 0x845c38f0, as I can see from hlist_node::pprev. I checked the memory - that element of obj_hash did actually contain the pointer to that object after Kmemleak had reported it as a potential leak. Kmemleak scans the memory areas from .bss, including obj_hash, so it should probably have seen that pointer. Looks like a false positive. Is it a known problem? Could it be that Kmemleak, say, saw that object after that had been allocated and removed from obj_pool but before it had been added to obj_hash? However, the subsequent scans would have shown in that case that the object is not leaked, I guess. Or, perhaps, something else causes this? Any ideas? Regards, Evgenii
Re: Bug with paravirt ops and livepatches
05.04.2016 16:07, Miroslav Benes пишет: On Mon, 4 Apr 2016, Josh Poimboeuf wrote: So I think this doesn't fix the problem. Dynamic relocations are applied to the "patch module", whereas the above code deals with the initialization order of the "patched module". This distinction originally confused me as well, until Jessica set me straight. Let me try to illustrate the problem with an example. Imagine you have a patch module P which applies a patch to module M. P replaces M's function F with a new function F', which uses paravirt ops. 1) Patch P is loaded before module M. P's new function F' has an instruction which is patched by apply_paravirt(), even though the patch hasn't been applied yet. 2) Module M is loaded. Before applying the patch, livepatch tries to apply a klp_reloc to the instruction in F' which was already patched by apply_paravirt() in step 1. This results in undefined behavior because it tries to patch the original instruction but instead patches the new paravirt instruction. So the above patch makes no difference because the paravirt module loading order doesn't really matter. Hi, we are trying really hard to understand the actual culprit here and as it is quite confusing I have several questions/comments... 1. can you provide dynrela sections of the patch module from https://github.com/dynup/kpatch/issues/580? What is interesting is that kvm_arch_vm_ioctl() function contains rela records only for trivial (== exported) symbols from the first look. The problem should be there only if you want to patch a function which reference some paravirt_ops unexported symbol. For that symbol dynrela should be created. As far as I understand it, create-diff-object does not check if a symbol is exported or not when a patch for a module rather than for vmlinux is being prepared. In such cases, it only checks if the referenced global symbol is defined somewhere in that module and if not, creates a dynrela for it. The code is in kpatch_create_dynamic_rela_sections() from https://github.com/dynup/kpatch/blob/master/kpatch-build/create-diff-object.c. 2. I am almost 100 percent sure that the second problem Chris mentions in github post is something different. If he patches only kvm_arch_vm_ioctl() so that it calls his exported __kvm_arch_vm_ioctl() duplicate there is no problem. Because it is a trivial livepatching case. There are no dynrelas and no alternatives in the patch module. The crash is suspicious and we have a problem somewhere. Chris, can you please provide more info about that? That is how exactly you used kallsyms_lookup_name() and so on... 3. Now I'll try to explain what really happens here... because of 1. and the way the alternatives and relocations are implemented the only problematic case is when one wants to patch a module which introduces its own alternatives. Which is probably the case of KVM. Why? When alternative is used, the call to some pv_*_ops.function is placed somewhere. The address of this location is stored in a separate elf section and when apply_paravirt() is called it takes the address and place the new code there (or it does not, it depends :)). When the alternative is in some module and pv_*_ops is exported, which is the usual case, there is no problem. No dynrela needs to be used when a user wants to patch such function with the alternative. The only problem is when the function uses unexported pv_*_ops (or some other alternative symbol) from its own object file. When the user wants to patch this one there is a need for dynrela. So what really happens is that we load a patch module, we do not apply our relocations but we do apply alternatives to the patch module, which is problematic because there is a reference to something which is not yet resolved (that is unexported pv_*_ops). When a patched module arrives we happily resolve relocations but since we do not apply alternatives again there is a rubbish in the patching code. Is this all correct? Jessica proposed some novel fixes here: https://github.com/dynup/kpatch/issues/580#issuecomment-183001652 Yes, I think that fix should be the same we have for relocations. To postpone the alternatives applications. Maybe it is even possible to do it in a similar way. And yes on one hand it is gonna be ugly, on the other hand it is gonna be consistent with relocations. I think the *real* problem here (and one that we've seen before) is that we have a feature which allows you to load a patch to a module before loading the module itself. That really goes against the grain of how module dependencies work. It has already given us several headaches and it makes the livepatch code a lot more complex. I really think we need to take another hard look about whether it's really worth it. My current feeling is that it's not. I can only say that maybe about 1/3 of kgraft patches we currently have are for modules (1/3 is probably not correct but it is definitely non-tr
[ANNOUNCE] RaceHound 1.1
RaceHound 1.1 has been released. This is a data race detector for the Linux kernel 3.14 or newer, on x86. It checks the kernel code in runtime and although it may miss some races, it produces no false alarms. It can be used to confirm the potential races found by other tools, or can be used standalone to "sweep" through a given area of code looking if data races happen there. RaceHound relies on Kprobes and hardware breakpoints to detect the conflicting memory accesses. Changes since v.1.0 Enhancements: * The kernel-mode part of RaceHound now reports the events ("breakpoint hit", "race found") to the user space via a file in debugfs that can be polled. The following two new scripts use this. * examples/events.py Python script simply outputs the current events to stdout. * examples/check_apps.py script allows to monitor a number of locations in the code, it adjusts the set of monitored locations depending on how often they are executed, to keep the overhead lower. Bug fixes: * lines2insns did not show all instructions in some cases (https://github.com/winnukem/racehound/issues/7) * A race between removal of a BP and processing of that BP was fixed - yes, RaceHound had races too ;-) * a few smaller fixes. Downloads: https://github.com/winnukem/racehound/releases/tag/1.1 Details, build instructions, etc: https://github.com/winnukem/racehound. Regards, Evgenii -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v8 0/8] livepatch: Atomic replace feature
Hi, Petr, Jason - thanks a lot for working on this series, first of all! And especially for your patience. On 21.02.2018 16:29, Petr Mladek wrote: The atomic replace allows to create cumulative patches. They are useful when you maintain many livepatches and want to remove one that is lower on the stack. In addition it is very useful when more patches touch the same function and there are dependencies between them. I have experimented with this updated patchset a bit. It looks like replace operation fails if there is a loaded but disabled patch. Suppose there are 2 binary patches changing the same kernel functions. Load patch1, then disable it (echo 0 > /sys/kernel/livepatch/patch1/enabled), then try load patch2 with atomic replace. I get "Device or resource busy" error at this point. It seems, __klp_enable_patch() returns -EBUSY for some reason. I haven't dug deeper into that code yet. A workaround is simple: just unload all disabled patches before trying to load patch2 with replace. Still, the behavior is quite strange. I have found one bug in v7. We were not able to initialize NOP struct klp_func when the patches module is not loaded. It was because func->new_func was NULL. I have fixed it in separate patch for an easier review. Note that the original Jason's patch did not have this problem because func->new_func was always NULL there. But this required NOP-specific handling also on other locations, namely klp_ftrace_handler() and klp_check_stack_func(). Changes against v7: + Fixed handling of NOPs for not-yet-loaded modules + Made klp_replaced_patches list static [Mirek] + Made klp_free_object() public later [Mirek] + Fixed several reported typos [Mirek, Joe] + Updated documentation according to the feedback [Joe] + Added some Acks [Mirek] Changes against v6: + used list_move when disabling replaced patches [Jason] + renamed KLP_FUNC_ORIGINAL -> KLP_FUNC_STATIC [Mirek] + used klp_is_func_type() in klp_unpatch_object() [Mirek] + moved static definition of klp_get_or_add_object() [Mirek] + updated comment about synchronization in forced mode [Mirek] + added user documentation + fixed several typos Jason Baron (5): livepatch: Use lists to manage patches, objects and functions livepatch: Initial support for dynamic structures livepatch: Allow to unpatch only functions of the given type livepatch: Support separate list for replaced patches. livepatch: Add atomic replace Petr Mladek (3): livepatch: Free only structures with initialized kobject livepatch: Correctly handle atomic replace for not yet loaded modules livepatch: Atomic replace and cumulative patches documentation Documentation/livepatch/cumulative-patches.txt | 83 ++ include/linux/livepatch.h | 59 +++- kernel/livepatch/core.c| 394 ++--- kernel/livepatch/core.h| 4 + kernel/livepatch/patch.c | 31 +- kernel/livepatch/patch.h | 4 +- kernel/livepatch/transition.c | 41 ++- 7 files changed, 566 insertions(+), 50 deletions(-) create mode 100644 Documentation/livepatch/cumulative-patches.txt Regards, Evgenii
[Bisected] Suspend-to-disk fails on a 32-bit x86 system since commit 92923ca
Hi, It seems, there is a regression in the kernel that prevents suspend-to-disk from working properly on some x86 machines with 32-bit Linux systems (ROSA Linux, in this case). With the mainline kernels 4.2 - 4.10, it takes more than 2 minutes from "systemctl hibernate" command till the system powers off. When I try to resume the system after that, the system hangs with black screen after "Image loading progress 100%" (no ssh access, no reaction to keyboard, etc.). For comparison, it takes less than 10-15 seconds from "systemctl hibernate" command till the poweroff when I use kernel 4.1, and the system resumes OK then. Hardware info: https://linux-hardware.org/index.php?probe=e6a06c64c7 Kernel config, just in case: http://pastebin.com/P8vb9qi6 The problem does not show up in a 64-bit system installed on the same machine. Bisection points at the following commit as the first "bad" one: commit 92923ca3aacef63c92dc297a75ad0c6dfe4eab37 Author: Nathan Zimmer Date: Tue Jun 30 14:56:48 2015 -0700 mm: meminit: only set page reserved in the memblock region I reverted it and suspend-to-disk and resume now work OK when I use the mainline kernels 4.2 - 4.10. For the kernels 4.8 - 4.10, I also had to disable CONFIG_RANDOMIZE_BASE but that is due to an unrelated issue, which I will report separately. Regards, Evgenii
32-bit x86 system reboots automatically on resume from hibernate (ASLR issue?)
Hi, One of my x86 machines with a 32-bit Linux system (ROSA Linux in this case) automatically reboots when it tries to resume from hibernate. This happens shortly after "Image loading progress 100%" message is shown on the screen. No traces of the error are in the system log after reboot though. The problem is present at least in the mainline kernels 4.8 - 4.10. With earlier versions (I tried 4.4, 4.5, etc.), the system resumes OK. The bisection pointed to the following commit as the first "bad" one: commit 65fe935dd2387a4faf15314c73f5e6d31ef0217e Author: Kees Cook Date: Mon Jun 13 15:10:02 2016 -0700 x86/KASLR, x86/power: Remove x86 hibernation restrictions Hardware: https://linux-hardware.org/index.php?probe=e6a06c64c7 Config used to build the kernel at rev. 65fe935: http://pastebin.com/AxEA6ahb If I understand it correctly, this commit just enabled ASLR by default regardless of whether hibernation support was present or not. Before this commit, ASLR was disabled on that system because hibernation was supported. To check if ASLR is really involved here, I rebuilt the kernel with CONFIG_RANDOMIZE_BASE unset - now the system resumes OK from hibernation, no auto reboots, no other visible problems so far. The problem does not show up in a 64-bit Linux system installed on the same machine. Only the 32-bit system is affected. Regards, Evgenii
Re: 32-bit x86 system reboots automatically on resume from hibernate (ASLR issue?)
On 21.03.2017 23:40, Kees Cook wrote: On Tue, Mar 21, 2017 at 6:54 AM, Evgenii Shatokhin wrote: Hi, One of my x86 machines with a 32-bit Linux system (ROSA Linux in this case) automatically reboots when it tries to resume from hibernate. This happens shortly after "Image loading progress 100%" message is shown on the screen. No traces of the error are in the system log after reboot though. The problem is present at least in the mainline kernels 4.8 - 4.10. With earlier versions (I tried 4.4, 4.5, etc.), the system resumes OK. The bisection pointed to the following commit as the first "bad" one: commit 65fe935dd2387a4faf15314c73f5e6d31ef0217e Author: Kees Cook Date: Mon Jun 13 15:10:02 2016 -0700 x86/KASLR, x86/power: Remove x86 hibernation restrictions Hrm, perhaps the 32-bit hibernation code still isn't KASLR-safe. If you boot with nokaslr on the kernel command line, does the problem go away? Yes. The problem does not show up when I boot the system with 'nokaslr'. Hardware: https://linux-hardware.org/index.php?probe=e6a06c64c7 Config used to build the kernel at rev. 65fe935: http://pastebin.com/AxEA6ahb If I understand it correctly, this commit just enabled ASLR by default regardless of whether hibernation support was present or not. Before this commit, ASLR was disabled on that system because hibernation was supported. To check if ASLR is really involved here, I rebuilt the kernel with CONFIG_RANDOMIZE_BASE unset - now the system resumes OK from hibernation, no auto reboots, no other visible problems so far. The problem does not show up in a 64-bit Linux system installed on the same machine. Only the 32-bit system is affected. (Why would you want to run 32-bit kernels on a 64-bit system?) Mostly for testing and debugging. While most of ROSA Linux users are OK with the 64-bit version of the distro, some still need its 32-bit variant. They reported (unrelated) problems with hibernate. I was debugging these and stumbled upon this problem as well. Regards, Evgenii -Kees
Re: [PATCH] hibernation: on 32-bit x86, disabled in favor of KASLR
On 23.03.2017 03:27, Kees Cook wrote: This is a modified revert of commit 65fe935dd238 ("x86/KASLR, x86/power: Remove x86 hibernation restrictions"), since it appears that 32-bit hibernation still can't support KASLR. 64-bit is fine. Since people have been running with KASLR by default on 32-bit since v4.8, this disables hibernation (with a warning). Booting with "nokaslr" will disable KASLR and enable hibernation. Reported-by: Evgenii Shatokhin Signed-off-by: Kees Cook Cc: sta...@vger.kernel.org # v4.8+ The patch does not work as intended on my system, unfortunately. I tried the mainline kernel v4.11-rc3 and added this patch. With "nokaslr" in the kernel command line, the system fails to hibernate. It complains this way in the log: <...> kernel: PM: writing image. kernel: PM: Cannot find swap device, try swapon -a. kernel: PM: Cannot get swap writer kernel: PM: Basic memory bitmaps freed kernel: Restarting tasks ... done. systemd[1]: Time has been changed systemd[3948]: Time has been changed systemd[14825]: Time has been changed systemd[1]: systemd-hibernate.service: main process exited, code=exited, status=1/FAILURE systemd[1]: Failed to start Hibernate. <...> The swap device (swap file, actually) is available, however: - # swapon -s Filename Type SizeUsed Priority /swap file 6297596 0 -1 - I built the same kernel without this patch then, added "nokaslr" in the kernel command line again, and the system hibernates and resumes fine. --- Documentation/admin-guide/kernel-parameters.txt | 5 + arch/x86/boot/compressed/kaslr.c| 3 +++ kernel/power/hibernate.c| 18 +- 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2ba45caabada..6f899c7f587d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1725,6 +1725,11 @@ kernel and module base offset ASLR (Address Space Layout Randomization). + On 32-bit x86 with CONFIG_HIBERNATION, hibernation + is disabled if KASLR is enabled. If "nokaslr" is + specified, KASLR will be diabled and hibernation + will be enabled. + keepinitrd [HW,ARM] kernelcore= [KNL,X86,IA-64,PPC] diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index 8b7c9e75edcb..b694af45f1e0 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -572,6 +572,9 @@ void choose_random_location(unsigned long input, return; } + if (IS_ENABLED(CONFIG_X86_32) && IS_ENABLED(CONFIG_HIBERNATION)) + warn("KASLR active: hibernation disabled on 32-bit x86."); + boot_params->hdr.loadflags |= KASLR_FLAG; /* Prepare to add new identity pagetables on demand. */ diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index a8b978c35a6a..1d8f1fe1b7f4 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -37,9 +37,14 @@ #include "power.h" -static int nocompress; +#if defined(CONFIG_X86_32) && defined(CONFIG_RANDOMIZE_BASE) +static int noresume = 1; +static int nohibernate = 1; +#else static int noresume; static int nohibernate; +#endif +static int nocompress; static int resume_wait; static unsigned int resume_delay; static char resume_file[256] = CONFIG_PM_STD_PARTITION; @@ -1194,3 +1199,14 @@ __setup("hibernate=", hibernate_setup); __setup("resumewait", resumewait_setup); __setup("resumedelay=", resumedelay_setup); __setup("nohibernate", nohibernate_setup); + +/* Allow hibernation to be disabled in favor of KASLR on 32-bit x86. */ +#if defined(CONFIG_X86_32) && defined(CONFIG_RANDOMIZE_BASE) +static int __init nokaslr_hibernate_setup(char *str) +{ + noresume = 0; + nohibernate = 0; + return 1; +} +__setup("nokaslr", nokaslr_hibernate_setup); +#endif
Re: [PATCH] hibernation: on 32-bit x86, disabled in favor of KASLR
On 23.03.2017 18:30, Rafael J. Wysocki wrote: On Thu, Mar 23, 2017 at 2:23 PM, Evgenii Shatokhin wrote: On 23.03.2017 03:27, Kees Cook wrote: This is a modified revert of commit 65fe935dd238 ("x86/KASLR, x86/power: Remove x86 hibernation restrictions"), since it appears that 32-bit hibernation still can't support KASLR. 64-bit is fine. Since people have been running with KASLR by default on 32-bit since v4.8, this disables hibernation (with a warning). Booting with "nokaslr" will disable KASLR and enable hibernation. Reported-by: Evgenii Shatokhin Signed-off-by: Kees Cook Cc: sta...@vger.kernel.org # v4.8+ The patch does not work as intended on my system, unfortunately. I tried the mainline kernel v4.11-rc3 and added this patch. With "nokaslr" in the kernel command line, the system fails to hibernate. It complains this way in the log: <...> kernel: PM: writing image. kernel: PM: Cannot find swap device, try swapon -a. kernel: PM: Cannot get swap writer kernel: PM: Basic memory bitmaps freed kernel: Restarting tasks ... done. systemd[1]: Time has been changed systemd[3948]: Time has been changed systemd[14825]: Time has been changed systemd[1]: systemd-hibernate.service: main process exited, code=exited, status=1/FAILURE systemd[1]: Failed to start Hibernate. <...> The swap device (swap file, actually) is available, however: - # swapon -s Filename Type SizeUsed Priority /swap file 6297596 0 -1 - I built the same kernel without this patch then, added "nokaslr" in the kernel command line again, and the system hibernates and resumes fine. With the patch applied and "nokaslr" in the kernel command line, what shows up when you do $ cat /sys/power/state ? freeze standby mem disk However, I think now that the patch itself is OK. I experimented with the patched kernel a bit more and found that hibernate does work when I place "nokaslr" before "resume=xxx resume_offset=xxx" in the kernel command line and does not work when I place "nokaslr" after these options. So I guess there is an issue with parsing of the kernel command line somewhere (dracut scripts? systemd? I do not know). If resume= or resume_offset= were corrupted, that might have been the reason why the system could not find the swap file when hibernating. Anyway, that issue is clearly unrelated to this patch and the patch itself works OK for me. Thanks a lot! Tested-by: Evgenii Shatokhin Regards, Evgenii