Re: [PATCH] fs: support filename refcount without atomics
> +static inline void makeatomicname(struct filename *name) > +{ > + VFS_BUG_ON(IS_ERR_OR_NULL(name)); > + /* > + * The name can legitimately already be atomic if it was cached by > audit. > + * If switching the refcount to atomic, we need not to know we are the > + * only non-atomic user. > + */ > + VFS_BUG_ON(name->owner != current && !name->is_atomic); > + /* > + * Don't bother branching, this is a store to an already dirtied > cacheline. > + */ > + name->is_atomic = true; > +} Should this not depend on audit being enabled? io_uring without audit is fine. -- Jens Axboe
Re: [PATCH] fs: support filename refcount without atomics
On Fri, Mar 7, 2025 at 5:18 PM Jens Axboe wrote: > > > +static inline void makeatomicname(struct filename *name) > > +{ > > + VFS_BUG_ON(IS_ERR_OR_NULL(name)); > > + /* > > + * The name can legitimately already be atomic if it was cached by > > audit. > > + * If switching the refcount to atomic, we need not to know we are the > > + * only non-atomic user. > > + */ > > + VFS_BUG_ON(name->owner != current && !name->is_atomic); > > + /* > > + * Don't bother branching, this is a store to an already dirtied > > cacheline. > > + */ > > + name->is_atomic = true; > > +} > > Should this not depend on audit being enabled? io_uring without audit is > fine. > I thought about it, but then I got worried about transitions from disabled to enabled -- will they suddenly start looking here? Should this test for audit_enabled, audit_dummy_context() or something else? I did not want to bother analyzing this. I'll note though this would be an optimization on top of the current code, so I don't think it *blocks* the patch. -- Mateusz Guzik
Re: [PATCH] fs: support filename refcount without atomics
On Fri, Mar 07, 2025 at 05:11:55PM +0100, Mateusz Guzik wrote: > +++ b/include/linux/fs.h > @@ -2765,11 +2765,19 @@ struct audit_names; > struct filename { > const char *name; /* pointer to actual string */ > const __user char *uptr; /* original userland pointer */ > - atomic_trefcnt; > + union { > + atomic_trefcnt_atomic; > + int refcnt; > + }; > +#ifdef CONFIG_DEBUG_VFS > + struct task_struct *owner; > +#endif > + boolis_atomic; > struct audit_names *aname; > const char iname[]; > }; 7 (or 3) byte hole; try to pad. Would it make more sense to put the bool between aname and iname where it will only take one byte instead of 8? Actually, maybe do it as: struct filename { const char *name; /* pointer to actual string */ const __user char *uptr; /* original userland pointer */ - atomic_trefcnt; struct audit_names *aname; +#ifdef CONFIG_DEBUG_VFS + struct task_struct *owner; +#endif + union { + atomic_trefcnt_atomic; + int refcnt; + }; + boolis_atomic; const char iname[]; }; and (on 64-bit), you're packed even more efficiently than right now.
[PATCH] fs: support filename refcount without atomics
Atomics are only needed for a combination of io_uring and audit. Regular file access (even with audit) gets around fine without them. With this patch 'struct filename' starts with being refcounted using regular ops. In order to avoid API explosion in the getname*() family, a dedicated routine is added to switch the obj to use atomics. This leaves the room for merely issuing getname(), not issuing the switch and still trying to manipulate the refcount from another thread. Catching such cases is facilitated by CONFIG_DEBUG_VFS-dependent tracking of who created the given filename object and having refname() and putname() detect if another thread is trying to modify them. Benchmarked on Sapphire Rapids issuing access() (ops/s): before: 5106246 after: 5269678 (+3%) Signed-off-by: Mateusz Guzik --- this can be split into 2 patches (refname, initname vs the atomic change). I also took the liberty to fix up some weird whitespace. the bench is access1 from will-it-scale (kind of): https://github.com/antonblanchard/will-it-scale/pull/36 correctness tested with io_uring + audit using the test suite in liburing. Confirmed mismatched owners show up: # bpftrace -e 'kprobe:putname /curtask != ((struct filename *)arg0)->owner/ { @[kstack()] = count(); }' Attaching 1 probe... ^C @[ putname+5 do_renameat2+279 io_renameat+40 io_issue_sqe+1159 io_wq_submit_work+200 io_worker_handle_work+313 io_wq_worker+218 ret_from_fork+49 ret_from_fork_asm+26 ]: 4 @[ putname+5 do_renameat2+287 io_renameat+40 io_issue_sqe+1159 io_wq_submit_work+200 io_worker_handle_work+313 io_wq_worker+218 ret_from_fork+49 ret_from_fork_asm+26 ]: 4 fs/namei.c | 44 +--- include/linux/fs.h | 37 - io_uring/fs.c| 8 io_uring/openclose.c | 1 + io_uring/statx.c | 3 +-- io_uring/xattr.c | 2 ++ kernel/auditsc.c | 12 +--- 7 files changed, 86 insertions(+), 21 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 06765d320e7e..ff76b495abef 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -125,6 +125,17 @@ #define EMBEDDED_NAME_MAX (PATH_MAX - offsetof(struct filename, iname)) +static inline void initname(struct filename *name) +{ + name->uptr = NULL; + name->aname = NULL; + name->is_atomic = false; +#ifdef CONFIG_DEBUG_VFS + name->owner = current; +#endif + atomic_set(&name->refcnt_atomic, 1); +} + struct filename * getname_flags(const char __user *filename, int flags) { @@ -203,10 +214,7 @@ getname_flags(const char __user *filename, int flags) return ERR_PTR(-ENAMETOOLONG); } } - - atomic_set(&result->refcnt, 1); - result->uptr = filename; - result->aname = NULL; + initname(result); audit_getname(result); return result; } @@ -264,26 +272,40 @@ struct filename *getname_kernel(const char * filename) return ERR_PTR(-ENAMETOOLONG); } memcpy((char *)result->name, filename, len); - result->uptr = NULL; - result->aname = NULL; - atomic_set(&result->refcnt, 1); + initname(result); audit_getname(result); - return result; } EXPORT_SYMBOL(getname_kernel); void putname(struct filename *name) { + int refcnt; + if (IS_ERR_OR_NULL(name)) return; - if (WARN_ON_ONCE(!atomic_read(&name->refcnt))) - return; + VFS_BUG_ON(name->owner != current && !name->is_atomic); - if (!atomic_dec_and_test(&name->refcnt)) + refcnt = atomic_read(&name->refcnt_atomic); + if (WARN_ON_ONCE(!refcnt)) return; + /* +* If refcnt == 1 then there is nobody who can legally change it. +* Short-circuiting this case saves a branch in the common case and +* an atomic op for the last unref (for the ->is_atomic variant). +*/ + if (refcnt != 1) { + if (name->is_atomic) { + if (!atomic_dec_and_test(&name->refcnt_atomic)) + return; + } else { + if (--name->refcnt) + return; + } + } + if (name->name != name->iname) { __putname(name->name); kfree(name); diff --git a/include/linux/fs.h b/include/linux/fs.h index 110d95d04299..b559a611dd15 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2765,11 +2765,19 @@ struct audit_names; struct filename { const char *name; /* pointer to actual string */ const __user char *uptr; /* original userland pointer */ - atomic_trefcnt; + union { + atomic_trefcnt_atomic; + int refcnt; + }; +#ifdef CONFIG_DEBUG_VFS +
Re: [PATCH] fs: support filename refcount without atomics
On 3/7/25 9:25 AM, Mateusz Guzik wrote: > On Fri, Mar 7, 2025 at 5:18?PM Jens Axboe wrote: >> >>> +static inline void makeatomicname(struct filename *name) >>> +{ >>> + VFS_BUG_ON(IS_ERR_OR_NULL(name)); >>> + /* >>> + * The name can legitimately already be atomic if it was cached by >>> audit. >>> + * If switching the refcount to atomic, we need not to know we are the >>> + * only non-atomic user. >>> + */ >>> + VFS_BUG_ON(name->owner != current && !name->is_atomic); >>> + /* >>> + * Don't bother branching, this is a store to an already dirtied >>> cacheline. >>> + */ >>> + name->is_atomic = true; >>> +} >> >> Should this not depend on audit being enabled? io_uring without audit is >> fine. >> > > I thought about it, but then I got worried about transitions from > disabled to enabled -- will they suddenly start looking here? Should > this test for audit_enabled, audit_dummy_context() or something else? > I did not want to bother analyzing this. Let me take a look at it, the markings for when to switch atomic are not accurate - it only really needs to happen for offload situations only, and if audit is enabled and tracking. So I think we can great improve upon this patch. > I'll note though this would be an optimization on top of the current > code, so I don't think it *blocks* the patch. Let's not go with something half-done if we can get it right the first time. -- Jens Axboe
[PATCH v2 3/6] LSM: security_lsmblob_to_secctx module selection
Add a parameter lsmid to security_lsmblob_to_secctx() to identify which of the security modules that may be active should provide the security context. If the value of lsmid is LSM_ID_UNDEF the first LSM providing a hook is used. security_secid_to_secctx() is unchanged, and will always report the first LSM providing a hook. Signed-off-by: Casey Schaufler --- include/linux/security.h | 6 -- kernel/audit.c | 4 ++-- kernel/auditsc.c | 8 +--- net/netlabel/netlabel_user.c | 3 ++- security/security.c | 13 +++-- 5 files changed, 24 insertions(+), 10 deletions(-) diff --git a/include/linux/security.h b/include/linux/security.h index 980b6c207cad..540894695c4b 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -566,7 +566,8 @@ int security_setprocattr(int lsmid, const char *name, void *value, size_t size); int security_netlink_send(struct sock *sk, struct sk_buff *skb); int security_ismaclabel(const char *name); int security_secid_to_secctx(u32 secid, struct lsm_context *cp); -int security_lsmprop_to_secctx(struct lsm_prop *prop, struct lsm_context *cp); +int security_lsmprop_to_secctx(struct lsm_prop *prop, struct lsm_context *cp, + int lsmid); int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid); void security_release_secctx(struct lsm_context *cp); void security_inode_invalidate_secctx(struct inode *inode); @@ -1543,7 +1544,8 @@ static inline int security_secid_to_secctx(u32 secid, struct lsm_context *cp) } static inline int security_lsmprop_to_secctx(struct lsm_prop *prop, -struct lsm_context *cp) +struct lsm_context *cp, +int lsmid) { return -EOPNOTSUPP; } diff --git a/kernel/audit.c b/kernel/audit.c index a4945f1c3ec0..293364bba961 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -1475,7 +1475,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh, case AUDIT_SIGNAL_INFO: if (lsmprop_is_set(&audit_sig_lsm)) { err = security_lsmprop_to_secctx(&audit_sig_lsm, -&lsmctx); +&lsmctx, LSM_ID_UNDEF); if (err < 0) return err; } @@ -2247,7 +2247,7 @@ int audit_log_task_context(struct audit_buffer *ab) if (!lsmprop_is_set(&prop)) return 0; - error = security_lsmprop_to_secctx(&prop, &ctx); + error = security_lsmprop_to_secctx(&prop, &ctx, LSM_ID_UNDEF); if (error < 0) { if (error != -EINVAL) goto error_path; diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 2ec3a0d85447..d98ce7097a2d 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1109,7 +1109,7 @@ static int audit_log_pid_context(struct audit_context *context, pid_t pid, from_kuid(&init_user_ns, auid), from_kuid(&init_user_ns, uid), sessionid); if (lsmprop_is_set(prop)) { - if (security_lsmprop_to_secctx(prop, &ctx) < 0) { + if (security_lsmprop_to_secctx(prop, &ctx, LSM_ID_UNDEF) < 0) { audit_log_format(ab, " obj=(none)"); rc = 1; } else { @@ -1395,7 +1395,8 @@ static void show_special(struct audit_context *context, int *call_panic) struct lsm_context lsmctx; if (security_lsmprop_to_secctx(&context->ipc.oprop, - &lsmctx) < 0) { + &lsmctx, + LSM_ID_UNDEF) < 0) { *call_panic = 1; } else { audit_log_format(ab, " obj=%s", lsmctx.context); @@ -1560,7 +1561,8 @@ static void audit_log_name(struct audit_context *context, struct audit_names *n, if (lsmprop_is_set(&n->oprop)) { struct lsm_context ctx; - if (security_lsmprop_to_secctx(&n->oprop, &ctx) < 0) { + if (security_lsmprop_to_secctx(&n->oprop, &ctx, + LSM_ID_UNDEF) < 0) { if (call_panic) *call_panic = 2; } else { diff --git a/net/netlabel/netlabel_user.c b/net/netlabel/netlabel_user.c index 0d04d23aafe7..6d6545297ee3 100644 --- a/net/netlabel/netlabel_user.c +++ b/net/netlabel/netlabel_user.c @@ -98,7 +98,8 @@ struct audit_buffer *netlbl_audit_start_common(int type, audit_info->sessionid); if (lsmprop_is_set(&audit_info->prop) && -
[PATCH v2 2/6] Audit: Allow multiple records in an audit_buffer
Replace the single skb pointer in an audit_buffer with a list of skb pointers. Add the audit_stamp information to the audit_buffer as there's no guarantee that there will be an audit_context containing the stamp associated with the event. At audit_log_end() time create auxiliary records (none are currently defined) as have been added to the list. Functions are created to manage the skb list in the audit_buffer. Suggested-by: Paul Moore Signed-off-by: Casey Schaufler --- kernel/audit.c | 111 +++-- 1 file changed, 89 insertions(+), 22 deletions(-) diff --git a/kernel/audit.c b/kernel/audit.c index 2a567f667528..a4945f1c3ec0 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -195,8 +195,10 @@ static struct audit_ctl_mutex { * to place it on a transmit queue. Multiple audit_buffers can be in * use simultaneously. */ struct audit_buffer { - struct sk_buff *skb; /* formatted skb ready to send */ + struct sk_buff *skb; /* the skb for audit_log functions */ + struct sk_buff_head skb_list; /* formatted skbs, ready to send */ struct audit_context *ctx; /* NULL or associated context */ + struct audit_stamp stamp; /* audit stamp for these records */ gfp_tgfp_mask; }; @@ -1776,10 +1778,13 @@ __setup("audit_backlog_limit=", audit_backlog_limit_set); static void audit_buffer_free(struct audit_buffer *ab) { + struct sk_buff *skb; + if (!ab) return; - kfree_skb(ab->skb); + while ((skb = skb_dequeue(&ab->skb_list))) + kfree_skb(skb); kmem_cache_free(audit_buffer_cache, ab); } @@ -1795,6 +1800,10 @@ static struct audit_buffer *audit_buffer_alloc(struct audit_context *ctx, ab->skb = nlmsg_new(AUDIT_BUFSIZ, gfp_mask); if (!ab->skb) goto err; + + skb_queue_head_init(&ab->skb_list); + skb_queue_tail(&ab->skb_list, ab->skb); + if (!nlmsg_put(ab->skb, 0, 0, type, 0, 0)) goto err; @@ -1860,7 +1869,6 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type) { struct audit_buffer *ab; - struct audit_stamp stamp; if (audit_initialized != AUDIT_INITIALIZED) return NULL; @@ -1915,14 +1923,14 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, return NULL; } - audit_get_stamp(ab->ctx, &stamp); + audit_get_stamp(ab->ctx, &ab->stamp); /* cancel dummy context to enable supporting records */ if (ctx) ctx->dummy = 0; audit_log_format(ab, "audit(%llu.%03lu:%u): ", -(unsigned long long)stamp.ctime.tv_sec, -stamp.ctime.tv_nsec/100, -stamp.serial); +(unsigned long long)ab->stamp.ctime.tv_sec, +ab->stamp.ctime.tv_nsec/100, +ab->stamp.serial); return ab; } @@ -2178,6 +2186,57 @@ void audit_log_key(struct audit_buffer *ab, char *key) audit_log_format(ab, "(null)"); } +/** + * audit_buffer_aux_new - Add an aux record buffer to the skb list + * @ab: audit_buffer + * @type: message type + * + * Aux records are allocated and added to the skb list of + * the "main" record. The ab->skb is reset to point to the + * aux record on its creation. When the aux record in complete + * ab->skb has to be reset to point to the "main" record. + * This allows the audit_log_ functions to be ignorant of + * which kind of record it is logging to. It also avoids adding + * special data for aux records. + * + * On success ab->skb will point to the new aux record. + * Returns 0 on success, -ENOMEM should allocation fail. + */ +static int audit_buffer_aux_new(struct audit_buffer *ab, int type) +{ + WARN_ON(ab->skb != skb_peek(&ab->skb_list)); + + ab->skb = nlmsg_new(AUDIT_BUFSIZ, ab->gfp_mask); + if (!ab->skb) + goto err; + if (!nlmsg_put(ab->skb, 0, 0, type, 0, 0)) + goto err; + skb_queue_tail(&ab->skb_list, ab->skb); + + audit_log_format(ab, "audit(%llu.%03lu:%u): ", +(unsigned long long)ab->stamp.ctime.tv_sec, +ab->stamp.ctime.tv_nsec/100, +ab->stamp.serial); + + return 0; + +err: + kfree_skb(ab->skb); + ab->skb = skb_peek(&ab->skb_list); + return -ENOMEM; +} + +/** + * audit_buffer_aux_end - Switch back to the "main" record from an aux record + * @ab: audit_buffer + * + * Restores the "main" audit record to ab->skb. + */ +static void audit_buffer_aux_end(struct audit_buffer *ab) +{ + ab->skb = skb_peek(&ab->skb_list); +} + int audit_log_task_context(struct audit_buffer *ab) { struct lsm_prop prop; @@ -2412
[PATCH v2 6/6] Audit: Add record for multiple object contexts
Create a new audit record AUDIT_MAC_OBJ_CONTEXTS. An example of the MAC_OBJ_CONTEXTS (1424) record is: type=MAC_OBJ_CONTEXTS[1424] msg=audit(1601152467.009:1050): obj_selinux=unconfined_u:object_r:user_home_t:s0 When an audit event includes a AUDIT_MAC_OBJ_CONTEXTS record the "obj=" field in other records in the event will be "obj=?". An AUDIT_MAC_OBJ_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on an object security context. Signed-off-by: Casey Schaufler --- include/linux/audit.h | 7 - include/linux/lsm_hooks.h | 3 +++ include/linux/security.h | 1 + include/uapi/linux/audit.h | 1 + kernel/audit.c | 53 +- kernel/auditsc.c | 45 security/security.c| 3 +++ security/selinux/hooks.c | 1 + security/smack/smack_lsm.c | 1 + 9 files changed, 79 insertions(+), 36 deletions(-) diff --git a/include/linux/audit.h b/include/linux/audit.h index ee3e2ce70c45..0b17acf459f2 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -186,8 +186,10 @@ extern voidaudit_log_path_denied(int type, const char *operation); extern voidaudit_log_lost(const char *message); +extern int audit_log_object_context(struct audit_buffer *ab, + struct lsm_prop *prop); extern int audit_log_subject_context(struct audit_buffer *ab, -struct lsm_prop *blob); +struct lsm_prop *prop); extern int audit_log_task_context(struct audit_buffer *ab); extern void audit_log_task_info(struct audit_buffer *ab); @@ -248,6 +250,9 @@ static inline void audit_log_key(struct audit_buffer *ab, char *key) { } static inline void audit_log_path_denied(int type, const char *operation) { } +static inline void audit_log_object_context(struct audit_buffer *ab, + struct lsm_prop *prop) +{ } static inline int audit_log_subject_context(struct audit_buffer *ab, struct lsm_prop *prop) { diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index e4d303ab1f20..464bd8ef4045 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -75,6 +75,8 @@ struct lsm_static_calls_table { * struct lsm_id - Identify a Linux Security Module. * @lsm: name of the LSM, must be approved by the LSM maintainers * @id: LSM ID number from uapi/linux/lsm.h + * @subjctx: true if LSM supports a subject context + * @objctx: true if LSM supports an object context * * Contains the information that identifies the LSM. */ @@ -82,6 +84,7 @@ struct lsm_id { const char *name; u64 id; bool subjctx; + bool objctx; }; /* diff --git a/include/linux/security.h b/include/linux/security.h index 79a9bf4a7cdd..7c1a6d99e148 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -169,6 +169,7 @@ struct lsm_prop { extern const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1]; extern u32 lsm_active_cnt; extern u32 lsm_subjctx_cnt; +extern u32 lsm_objctx_cnt; extern const struct lsm_id *lsm_idlist[]; /* These functions are in security/commoncap.c */ diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h index 5ebb5d80363d..8ca58144bcc6 100644 --- a/include/uapi/linux/audit.h +++ b/include/uapi/linux/audit.h @@ -147,6 +147,7 @@ #define AUDIT_IPE_CONFIG_CHANGE1421/* IPE config change */ #define AUDIT_IPE_POLICY_LOAD 1422/* IPE policy load */ #define AUDIT_MAC_TASK_CONTEXTS1423/* Multiple LSM task contexts */ +#define AUDIT_MAC_OBJ_CONTEXTS 1424/* Multiple LSM objext contexts */ #define AUDIT_FIRST_KERN_ANOM_MSG 1700 #define AUDIT_LAST_KERN_ANOM_MSG1799 diff --git a/kernel/audit.c b/kernel/audit.c index f0c1f0c0b250..054776f29327 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -1116,7 +1116,6 @@ static int is_audit_feature_set(int i) return af.features & AUDIT_FEATURE_TO_MASK(i); } - static int audit_get_feature(struct sk_buff *skb) { u32 seq; @@ -2302,6 +2301,58 @@ int audit_log_task_context(struct audit_buffer *ab) } EXPORT_SYMBOL(audit_log_task_context); +int audit_log_object_context(struct audit_buffer *ab, struct lsm_prop *prop) +{ + int i; + int rc; + int error = 0; + char *space = ""; + struct lsm_context context; + + if (lsm_objctx_cnt < 2) { + error = security_lsmprop_to_secctx(prop, &context, + LSM_ID_UNDEF); + if (error < 0) { + if (error != -EINVAL) + goto error_path; + return error; + } + audit_log_format(
[PATCH v2 4/6] Audit: Add record for multiple task security contexts
Create a new audit record AUDIT_MAC_TASK_CONTEXTS. An example of the MAC_TASK_CONTEXTS (1423) record is: type=MAC_TASK_CONTEXTS[1423] msg=audit(1600880931.832:113) subj_apparmor=unconfined subj_smack=_ When an audit event includes a AUDIT_MAC_TASK_CONTEXTS record the "subj=" field in other records in the event will be "subj=?". An AUDIT_MAC_TASK_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on a subject security context. Signed-off-by: Casey Schaufler --- include/linux/lsm_hooks.h | 1 + include/linux/security.h | 1 + include/uapi/linux/audit.h | 1 + kernel/audit.c | 45 -- security/apparmor/lsm.c| 1 + security/bpf/hooks.c | 1 + security/security.c| 3 +++ security/selinux/hooks.c | 1 + security/smack/smack_lsm.c | 1 + 9 files changed, 48 insertions(+), 7 deletions(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 090d1d3e19fe..e4d303ab1f20 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -81,6 +81,7 @@ struct lsm_static_calls_table { struct lsm_id { const char *name; u64 id; + bool subjctx; }; /* diff --git a/include/linux/security.h b/include/linux/security.h index 540894695c4b..79a9bf4a7cdd 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -168,6 +168,7 @@ struct lsm_prop { extern const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1]; extern u32 lsm_active_cnt; +extern u32 lsm_subjctx_cnt; extern const struct lsm_id *lsm_idlist[]; /* These functions are in security/commoncap.c */ diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h index d9a069b4a775..5ebb5d80363d 100644 --- a/include/uapi/linux/audit.h +++ b/include/uapi/linux/audit.h @@ -146,6 +146,7 @@ #define AUDIT_IPE_ACCESS 1420/* IPE denial or grant */ #define AUDIT_IPE_CONFIG_CHANGE1421/* IPE config change */ #define AUDIT_IPE_POLICY_LOAD 1422/* IPE policy load */ +#define AUDIT_MAC_TASK_CONTEXTS1423/* Multiple LSM task contexts */ #define AUDIT_FIRST_KERN_ANOM_MSG 1700 #define AUDIT_LAST_KERN_ANOM_MSG1799 diff --git a/kernel/audit.c b/kernel/audit.c index 293364bba961..59eaf69ee8ac 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #include #include @@ -2241,21 +2242,51 @@ int audit_log_task_context(struct audit_buffer *ab) { struct lsm_prop prop; struct lsm_context ctx; + bool space = false; int error; + int i; security_current_getlsmprop_subj(&prop); if (!lsmprop_is_set(&prop)) return 0; - error = security_lsmprop_to_secctx(&prop, &ctx, LSM_ID_UNDEF); - if (error < 0) { - if (error != -EINVAL) - goto error_path; + if (lsm_subjctx_cnt < 2) { + error = security_lsmprop_to_secctx(&prop, &ctx, LSM_ID_UNDEF); + if (error < 0) { + if (error != -EINVAL) + goto error_path; + return 0; + } + audit_log_format(ab, " subj=%s", ctx.context); + security_release_secctx(&ctx); return 0; } - - audit_log_format(ab, " subj=%s", ctx.context); - security_release_secctx(&ctx); + /* Multiple LSMs provide contexts. Include an aux record. */ + audit_log_format(ab, " subj=?"); + error = audit_buffer_aux_new(ab, AUDIT_MAC_TASK_CONTEXTS); + if (error) + goto error_path; + + for (i = 0; i < lsm_active_cnt; i++) { + if (!lsm_idlist[i]->subjctx) + continue; + error = security_lsmprop_to_secctx(&prop, &ctx, + lsm_idlist[i]->id); + if (error < 0) { + if (error == -EOPNOTSUPP) + continue; + audit_log_format(ab, "%ssubj_%s=?", space ? " " : "", +lsm_idlist[i]->name); + if (error != -EINVAL) + audit_panic("error in audit_log_task_context"); + } else { + audit_log_format(ab, "%ssubj_%s=%s", space ? " " : "", +lsm_idlist[i]->name, ctx.context); + security_release_secctx(&ctx); + } + space = true; + } + audit_buffer_aux_end(ab); return 0; error_path: diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c index 9b6c2f157f83..17ec93a8d3fc 100644 --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -1427,6 +1427,7 @@ struct lsm_blob_sizes apparmor_blob_sizes __ro_after_init
[RFC PATCH v4 1/1] ipe: add errno field to IPE policy load auditing
Users of IPE require a way to identify when and why an operation fails, allowing them to both respond to violations of policy and be notified of potentially malicious actions on their systems with respect to IPE. This patch introduces a new error field to the AUDIT_IPE_POLICY_LOAD event to log policy loading failures. Currently, IPE only logs successful policy loads, but not failures. Tracking failures is crucial to detect malicious attempts and ensure a complete audit trail for security events. The new error field will capture the following error codes: * -ENOKEY: Key used to sign the IPE policy not found in the keyring * -ESTALE: Attempting to update an IPE policy with an older version * -EKEYREJECTED: IPE signature verification failed * -ENOENT: Policy was deleted while updating * -EEXIST: Same name policy already deployed * -ERANGE: Policy version number overflow * -EINVAL: Policy version parsing error * -EPERM: Insufficient permission * -ENOMEM: Out of memory (OOM) * -EBADMSG: Policy is invalid Here are some examples of the updated audit record types: AUDIT_IPE_POLICY_LOAD(1422): audit: AUDIT1422 policy_name="Test_Policy" policy_version=0.0.1 policy_digest=sha256:84EFBA8FA71E62AE0A537FAB962F8A2BD1053964C4299DCA 92BFFF4DB82E86D3 auid=1000 ses=3 lsm=ipe res=1 errno=0 The above record shows a new policy has been successfully loaded into the kernel with the policy name, version, and hash with the errno=0. AUDIT_IPE_POLICY_LOAD(1422) with error: audit: AUDIT1422 policy_name=? policy_version=? policy_digest=? auid=1000 ses=3 lsm=ipe res=0 errno=-74 The above record shows a policy load failure due to an invalid policy (-EBADMSG). Signed-off-by: Jasjiv Singh --- Documentation/admin-guide/LSM/ipe.rst | 69 +++ security/ipe/audit.c | 21 ++-- security/ipe/fs.c | 19 ++-- security/ipe/policy.c | 11 - security/ipe/policy_fs.c | 29 --- 5 files changed, 111 insertions(+), 38 deletions(-) diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst index f93a467db628..0615941de6e0 100644 --- a/Documentation/admin-guide/LSM/ipe.rst +++ b/Documentation/admin-guide/LSM/ipe.rst @@ -423,7 +423,7 @@ Field descriptions: Event Example:: - type=1422 audit(1653425529.927:53): policy_name="boot_verified" policy_version=0.0.0 policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1 + type=1422 audit(1653425529.927:53): policy_name="boot_verified" policy_version=0.0.0 policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1 errno=0 type=1300 audit(1653425529.927:53): arch=c03e syscall=1 success=yes exit=2567 a0=3 a1=5596fcae1fb0 a2=a07 a3=2 items=0 ppid=184 pid=229 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="python3" exe="/usr/bin/python3.10" key=(null) type=1327 audit(1653425529.927:53): PROCTITLE proctitle=707974686F6E3300746573742F6D61696E2E7079002D66002E2E @@ -433,24 +433,55 @@ This record will always be emitted in conjunction with a ``AUDITSYSCALL`` record Field descriptions: -+++---+---+ -| Field | Value Type | Optional? | Description of Value | -+++===+===+ -| policy_name| string | No| The policy_name | -+++---+---+ -| policy_version | string | No| The policy_version | -+++---+---+ -| policy_digest | string | No| The policy hash | -+++---+---+ -| auid | integer| No| The login user ID | -+++---+---+ -| ses| integer| No| The login session ID | -+++---+---+ -| lsm| string | No| The lsm name associated with the event| -+++---+---+ -| res| integer| No| The result of the audited operation(success/fail) | -+++---+---+ - +++
Re: [PATCH] fs: support filename refcount without atomics
On Fri, Mar 7, 2025 at 5:32 PM Jens Axboe wrote: > > On 3/7/25 9:25 AM, Mateusz Guzik wrote: > > On Fri, Mar 7, 2025 at 5:18?PM Jens Axboe wrote: > >> > >>> +static inline void makeatomicname(struct filename *name) > >>> +{ > >>> + VFS_BUG_ON(IS_ERR_OR_NULL(name)); > >>> + /* > >>> + * The name can legitimately already be atomic if it was cached by > >>> audit. > >>> + * If switching the refcount to atomic, we need not to know we are > >>> the > >>> + * only non-atomic user. > >>> + */ > >>> + VFS_BUG_ON(name->owner != current && !name->is_atomic); > >>> + /* > >>> + * Don't bother branching, this is a store to an already dirtied > >>> cacheline. > >>> + */ > >>> + name->is_atomic = true; > >>> +} > >> > >> Should this not depend on audit being enabled? io_uring without audit is > >> fine. > >> > > > > I thought about it, but then I got worried about transitions from > > disabled to enabled -- will they suddenly start looking here? Should > > this test for audit_enabled, audit_dummy_context() or something else? > > I did not want to bother analyzing this. > > Let me take a look at it, the markings for when to switch atomic are not > accurate - it only really needs to happen for offload situations only, > and if audit is enabled and tracking. So I think we can great improve > upon this patch. > I aimed for this to be a NOP for io_uring, so to speak, specifically because I could not be arsed to deal with audit. > > I'll note though this would be an optimization on top of the current > > code, so I don't think it *blocks* the patch. > > Let's not go with something half-done if we can get it right the first > time. > Since you volunteered to sort this out, I'll be happy to wait. -- Mateusz Guzik
Re: [PATCH] fs: support filename refcount without atomics
On 3/7/25 9:35 AM, Mateusz Guzik wrote: > On Fri, Mar 7, 2025 at 5:32?PM Jens Axboe wrote: >> >> On 3/7/25 9:25 AM, Mateusz Guzik wrote: >>> On Fri, Mar 7, 2025 at 5:18?PM Jens Axboe wrote: > +static inline void makeatomicname(struct filename *name) > +{ > + VFS_BUG_ON(IS_ERR_OR_NULL(name)); > + /* > + * The name can legitimately already be atomic if it was cached by > audit. > + * If switching the refcount to atomic, we need not to know we are > the > + * only non-atomic user. > + */ > + VFS_BUG_ON(name->owner != current && !name->is_atomic); > + /* > + * Don't bother branching, this is a store to an already dirtied > cacheline. > + */ > + name->is_atomic = true; > +} Should this not depend on audit being enabled? io_uring without audit is fine. >>> >>> I thought about it, but then I got worried about transitions from >>> disabled to enabled -- will they suddenly start looking here? Should >>> this test for audit_enabled, audit_dummy_context() or something else? >>> I did not want to bother analyzing this. >> >> Let me take a look at it, the markings for when to switch atomic are not >> accurate - it only really needs to happen for offload situations only, >> and if audit is enabled and tracking. So I think we can great improve >> upon this patch. >> > > I aimed for this to be a NOP for io_uring, so to speak, specifically > because I could not be arsed to deal with audit. Hah I hear ya... But right now it seems to mark it atomic for any of the fs based ops, which is not really necessary. >>> I'll note though this would be an optimization on top of the current >>> code, so I don't think it *blocks* the patch. >> >> Let's not go with something half-done if we can get it right the first >> time. >> > > Since you volunteered to sort this out, I'll be happy to wait. I'll take a look start next week, don't think it should be too bad. You already did 90% of the work. -- Jens Axboe
Re: [PATCH] fs: support filename refcount without atomics
On Fri, Mar 7, 2025 at 5:38 PM Jens Axboe wrote: > > On 3/7/25 9:35 AM, Mateusz Guzik wrote: > > Since you volunteered to sort this out, I'll be happy to wait. > > I'll take a look start next week, don't think it should be too bad. You > already did 90% of the work. > Sounds good. In the mean time there may be io_uring-unrelated feedback. -- Mateusz Guzik
Re: [PATCH] fs: support filename refcount without atomics
On Fri, Mar 7, 2025 at 5:42 PM Al Viro wrote: > > On Fri, Mar 07, 2025 at 05:11:55PM +0100, Mateusz Guzik wrote: > > Atomics are only needed for a combination of io_uring and audit. > > > > Regular file access (even with audit) gets around fine without them. > > > > With this patch 'struct filename' starts with being refcounted using > > regular ops. > > > > In order to avoid API explosion in the getname*() family, a dedicated > > routine is added to switch the obj to use atomics. > > > > This leaves the room for merely issuing getname(), not issuing the > > switch and still trying to manipulate the refcount from another thread. > > > > Catching such cases is facilitated by CONFIG_DEBUG_VFS-dependent > > tracking of who created the given filename object and having refname() > > and putname() detect if another thread is trying to modify them. > > Not a good way to handle that, IMO. > > Atomics do hurt there, but they are only plastering over the real > problem - names formed in one thread, inserted into audit context > there and operation involving them happening in a different thread. > > Refcounting avoids an instant memory corruption, but the real PITA > is in audit users of that stuff. > > IMO we should *NOT* grab an audit names slot at getname() time - > that ought to be done explicitly at later points. > > The obstacle is that currently there still are several retry loop > with getname() done in it; I've most of that dealt with, need to > finish that series. > > And yes, refcount becomes non-atomic as the result. Well yes, it was audit which caused the appearance of atomics in the first place. I was looking for an easy way out. If you have something which gets rid of the underlying problem and it is going to land in the foreseeable future, I wont be defending this approach. -- Mateusz Guzik
Re: [PATCH] fs: support filename refcount without atomics
On Fri, Mar 07, 2025 at 05:11:55PM +0100, Mateusz Guzik wrote: > Atomics are only needed for a combination of io_uring and audit. > > Regular file access (even with audit) gets around fine without them. > > With this patch 'struct filename' starts with being refcounted using > regular ops. > > In order to avoid API explosion in the getname*() family, a dedicated > routine is added to switch the obj to use atomics. > > This leaves the room for merely issuing getname(), not issuing the > switch and still trying to manipulate the refcount from another thread. > > Catching such cases is facilitated by CONFIG_DEBUG_VFS-dependent > tracking of who created the given filename object and having refname() > and putname() detect if another thread is trying to modify them. Not a good way to handle that, IMO. Atomics do hurt there, but they are only plastering over the real problem - names formed in one thread, inserted into audit context there and operation involving them happening in a different thread. Refcounting avoids an instant memory corruption, but the real PITA is in audit users of that stuff. IMO we should *NOT* grab an audit names slot at getname() time - that ought to be done explicitly at later points. The obstacle is that currently there still are several retry loop with getname() done in it; I've most of that dealt with, need to finish that series. And yes, refcount becomes non-atomic as the result.
Re: [PATCH v1 1/2] audit: record fanotify event regardless of presence of rules
On Thu 06-03-25 20:12:23, Richard Guy Briggs wrote: > On 2025-03-06 16:06, Jan Kara wrote: > > On Wed 05-03-25 16:33:19, Richard Guy Briggs wrote: > > > When no audit rules are in place, fanotify event results are > > > unconditionally dropped due to an explicit check for the existence of > > > any audit rules. Given this is a report from another security > > > sub-system, allow it to be recorded regardless of the existence of any > > > audit rules. > > > > > > To test, install and run the fapolicyd daemon with default config. Then > > > as an unprivileged user, create and run a very simple binary that should > > > be denied. Then check for an event with > > > ausearch -m FANOTIFY -ts recent > > > > > > Link: https://issues.redhat.com/browse/RHEL-1367 > > > Signed-off-by: Richard Guy Briggs > > > > I don't know enough about security modules to tell whether this is what > > admins want or not so that's up to you but: > > > > > -static inline void audit_fanotify(u32 response, struct > > > fanotify_response_info_audit_rule *friar) > > > -{ > > > - if (!audit_dummy_context()) > > > - __audit_fanotify(response, friar); > > > -} > > > - > > > > I think this is going to break compilation with !CONFIG_AUDITSYSCALL && > > CONFIG_FANOTIFY? > > Why would that break it? The part of the patch you (prematurely) > deleted takes care of that. So I'm failing to see how it takes care of that when with !CONFIG_AUDITSYSCALL kernel/auditsc.c does not get compiled into the kernel. So what does provide the implementation of audit_fanotify() in that case? I think you need to provide empty audit_fanotify() inline wrapper for that case... Honza > diff --git a/include/linux/audit.h b/include/linux/audit.h > index 0050ef288ab3..d0c6f23503a1 100644 > --- a/include/linux/audit.h > +++ b/include/linux/audit.h > @@ -418,7 +418,7 @@ extern void __audit_log_capset(const struct cred *new, > const struct cred *old); > extern void __audit_mmap_fd(int fd, int flags); > extern void __audit_openat2_how(struct open_how *how); > extern void __audit_log_kern_module(char *name); > -extern void __audit_fanotify(u32 response, struct > fanotify_response_info_audit_rule *friar); > +extern void audit_fanotify(u32 response, struct > fanotify_response_info_audit_rule *friar); > extern void __audit_tk_injoffset(struct timespec64 offset); > extern void __audit_ntp_log(const struct audit_ntp_data *ad); > extern void __audit_log_nfcfg(const char *name, u8 af, unsigned int nentries, > > > Honza > > -- > > Jan Kara > > SUSE Labs, CR > > - RGB > > -- > Richard Guy Briggs > Sr. S/W Engineer, Kernel Security, Base Operating Systems > Remote, Ottawa, Red Hat Canada > Upstream IRC: SunRaycer > Voice: +1.613.860 2354 SMS: +1.613.518.6570 > -- Jan Kara SUSE Labs, CR
Re: [PATCH v5 02/24] landlock: Add unique ID generator
On Fri, Jan 31, 2025 at 05:30:37PM +0100, Mickaël Salaün wrote: > --- /dev/null > +++ b/security/landlock/id.c > +static atomic64_t next_id = ATOMIC64_INIT(COUNTER_PRE_INIT); > + > +static void __init init_id(atomic64_t *const counter, const u32 > random_32bits) > +{ > + u64 init; > + > + /* > + * Ensures sure 64-bit values are always used by user space (or may > + * fail with -EOVERFLOW), and makes this testable. > + */ > + init = 1ULL << 32; > + > + /* > + * Makes a large (2^32) boot-time value to limit ID collision in logs > + * from different boots, and to limit info leak about the number of > + * initially (relative to the reader) created elements (e.g. domains). > + */ > + init += random_32bits; > + > + /* Sets first or ignores. This will be the first ID. */ > + atomic64_cmpxchg(counter, COUNTER_PRE_INIT, init); It feels like this should always need to succeed. Or to say it the other way around: If this cmpxchg were to fail, the guarantees from your commit message would be broken. Maybe it would be worth handling that error case in a more direct way? > +static void __init test_init_once(struct kunit *const test) > +{ > + const u64 first_init = 1ULL + U32_MAX; > + atomic64_t counter = ATOMIC64_INIT(COUNTER_PRE_INIT); > + > + init_id(&counter, 0); > + KUNIT_EXPECT_EQ(test, atomic64_read(&counter), first_init); > + > + init_id(&counter, ~0); > + KUNIT_EXPECT_EQ(test, atomic64_read(&counter), first_init); Maybe we can annotate this with an explanatory message, to make it slightly clearer that this is the point of the test: KUNIT_EXPECT_EQ_MSG(test, atomic64_read(&counter), first_init, "should still have the same value after the subsequent init_id()"); –Günther
Re: [PATCH] fs: support filename refcount without atomics
On Fri, Mar 7, 2025 at 5:26 PM Matthew Wilcox wrote: > > On Fri, Mar 07, 2025 at 05:11:55PM +0100, Mateusz Guzik wrote: > > +++ b/include/linux/fs.h > > @@ -2765,11 +2765,19 @@ struct audit_names; > > struct filename { > > const char *name; /* pointer to actual string */ > > const __user char *uptr; /* original userland pointer */ > > - atomic_trefcnt; > > + union { > > + atomic_trefcnt_atomic; > > + int refcnt; > > + }; > > +#ifdef CONFIG_DEBUG_VFS > > + struct task_struct *owner; > > +#endif > > + boolis_atomic; > > struct audit_names *aname; > > const char iname[]; > > }; > > 7 (or 3) byte hole; try to pad. > > Would it make more sense to put the bool between aname and iname where > it will only take one byte instead of 8? On the stock kernel there is already a 4 byte hole between the refcount and aname, which is where is_atomic lands with debug disabled. I.e. no size changes in production kernels with and without the change. However, now that you mention it the debug owner field is misplaced -- it should have landed *after* is_atomic. Maybe Christian will be happy to just move it, otherwise I'm going to include this in a v2. The iname field is expected to be aligned, so I don't believe shuffling the is_atomic flag helps anyone: static_assert(offsetof(struct filename, iname) % sizeof(long) == 0); -- Mateusz Guzik
[PATCH v2 5/6] Audit: multiple subject lsm values for netlabel
Refactor audit_log_task_context(), creating a new audit_log_subject_context(). This is used in netlabel auditing to provide multiple subject security contexts as necessary. Signed-off-by: Casey Schaufler --- include/linux/audit.h| 8 kernel/audit.c | 21 ++--- net/netlabel/netlabel_user.c | 9 + 3 files changed, 23 insertions(+), 15 deletions(-) diff --git a/include/linux/audit.h b/include/linux/audit.h index 0050ef288ab3..ee3e2ce70c45 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -37,6 +37,7 @@ struct audit_watch; struct audit_tree; struct sk_buff; struct kern_ipc_perm; +struct lsm_prop; struct audit_krule { u32 pflags; @@ -185,6 +186,8 @@ extern void audit_log_path_denied(int type, const char *operation); extern voidaudit_log_lost(const char *message); +extern int audit_log_subject_context(struct audit_buffer *ab, +struct lsm_prop *blob); extern int audit_log_task_context(struct audit_buffer *ab); extern void audit_log_task_info(struct audit_buffer *ab); @@ -245,6 +248,11 @@ static inline void audit_log_key(struct audit_buffer *ab, char *key) { } static inline void audit_log_path_denied(int type, const char *operation) { } +static inline int audit_log_subject_context(struct audit_buffer *ab, + struct lsm_prop *prop) +{ + return 0; +} static inline int audit_log_task_context(struct audit_buffer *ab) { return 0; diff --git a/kernel/audit.c b/kernel/audit.c index 59eaf69ee8ac..f0c1f0c0b250 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -2238,20 +2238,18 @@ static void audit_buffer_aux_end(struct audit_buffer *ab) ab->skb = skb_peek(&ab->skb_list); } -int audit_log_task_context(struct audit_buffer *ab) +int audit_log_subject_context(struct audit_buffer *ab, struct lsm_prop *prop) { - struct lsm_prop prop; struct lsm_context ctx; bool space = false; int error; int i; - security_current_getlsmprop_subj(&prop); - if (!lsmprop_is_set(&prop)) + if (!lsmprop_is_set(prop)) return 0; if (lsm_subjctx_cnt < 2) { - error = security_lsmprop_to_secctx(&prop, &ctx, LSM_ID_UNDEF); + error = security_lsmprop_to_secctx(prop, &ctx, LSM_ID_UNDEF); if (error < 0) { if (error != -EINVAL) goto error_path; @@ -2270,7 +2268,7 @@ int audit_log_task_context(struct audit_buffer *ab) for (i = 0; i < lsm_active_cnt; i++) { if (!lsm_idlist[i]->subjctx) continue; - error = security_lsmprop_to_secctx(&prop, &ctx, + error = security_lsmprop_to_secctx(prop, &ctx, lsm_idlist[i]->id); if (error < 0) { if (error == -EOPNOTSUPP) @@ -2290,9 +2288,18 @@ int audit_log_task_context(struct audit_buffer *ab) return 0; error_path: - audit_panic("error in audit_log_task_context"); + audit_panic("error in audit_log_subject_context"); return error; } +EXPORT_SYMBOL(audit_log_subject_context); + +int audit_log_task_context(struct audit_buffer *ab) +{ + struct lsm_prop prop; + + security_current_getlsmprop_subj(&prop); + return audit_log_subject_context(ab, &prop); +} EXPORT_SYMBOL(audit_log_task_context); void audit_log_d_path_exe(struct audit_buffer *ab, diff --git a/net/netlabel/netlabel_user.c b/net/netlabel/netlabel_user.c index 6d6545297ee3..3d46ea6a8bb8 100644 --- a/net/netlabel/netlabel_user.c +++ b/net/netlabel/netlabel_user.c @@ -84,7 +84,6 @@ struct audit_buffer *netlbl_audit_start_common(int type, struct netlbl_audit *audit_info) { struct audit_buffer *audit_buf; - struct lsm_context ctx; if (audit_enabled == AUDIT_OFF) return NULL; @@ -96,13 +95,7 @@ struct audit_buffer *netlbl_audit_start_common(int type, audit_log_format(audit_buf, "netlabel: auid=%u ses=%u", from_kuid(&init_user_ns, audit_info->loginuid), audit_info->sessionid); - - if (lsmprop_is_set(&audit_info->prop) && - security_lsmprop_to_secctx(&audit_info->prop, &ctx, - LSM_ID_UNDEF) > 0) { - audit_log_format(audit_buf, " subj=%s", ctx.context); - security_release_secctx(&ctx); - } + audit_log_subject_context(audit_buf, &audit_info->prop); return audit_buf; } -- 2.47.0
Re: [PATCH] fs: support filename refcount without atomics
On Fri, Mar 7, 2025 at 5:44 PM Mateusz Guzik wrote: > > On Fri, Mar 7, 2025 at 5:42 PM Al Viro wrote: > > Not a good way to handle that, IMO. > > > > Atomics do hurt there, but they are only plastering over the real > > problem - names formed in one thread, inserted into audit context > > there and operation involving them happening in a different thread. > > > > Refcounting avoids an instant memory corruption, but the real PITA > > is in audit users of that stuff. > > > > IMO we should *NOT* grab an audit names slot at getname() time - > > that ought to be done explicitly at later points. > > I was looking at doing that, but the code is kind of a mess and I bailed. > > The obstacle is that currently there still are several retry loop > > with getname() done in it; I've most of that dealt with, need to > > finish that series. > > > > And yes, refcount becomes non-atomic as the result. > > Well yes, it was audit which caused the appearance of atomics in the > first place. I was looking for an easy way out. > > If you have something which gets rid of the underlying problem and it > is going to land in the foreseeable future, I wont be defending this > approach. > It is unclear to me if you are NAKing the patch, or merely pointing out this can be done in a better way (which I agree with) Some time ago I posted a much simpler patch to merely dodge the last decrement [1], which already accomplishes what I was looking for. Christian did not like it and wanted something which only deals with atomics when audit is enabled. I should have done that patch slightly differently, but bottom line is the following in putname(): refcnt = atomic_read(&name->refcnt); if (refcnt != 1) { if (WARN_ON_ONCE(!refcnt)) return; if (!atomic_dec_and_test(&name->refcnt)) return; } So if you are NAKing the regular -> atomic switch patch, how about the above as a quick hack until the issue gets resolved? It is trivial to reason about (refcnt == 1 means nobody can do anything) and guarantees to dodge one atomic (which in case of no audit means all consumers). I can repost touched up if you are OK with it (the original posting issues atomic_read twice). As for the bigger patch posted here, Jens wants the io_uring bits done differently and offered to handle them in the upcoming week. I think a clear statement if the patch is a no-go would be appreciated. Link 1: https://lore.kernel.org/linux-fsdevel/20240604132448.101183-1-mjgu...@gmail.com/ -- Mateusz Guzik
Re: [PATCH v1 1/2] audit: record fanotify event regardless of presence of rules
On 2025-03-07 15:52, Jan Kara wrote: > On Thu 06-03-25 20:12:23, Richard Guy Briggs wrote: > > On 2025-03-06 16:06, Jan Kara wrote: > > > On Wed 05-03-25 16:33:19, Richard Guy Briggs wrote: > > > > When no audit rules are in place, fanotify event results are > > > > unconditionally dropped due to an explicit check for the existence of > > > > any audit rules. Given this is a report from another security > > > > sub-system, allow it to be recorded regardless of the existence of any > > > > audit rules. > > > > > > > > To test, install and run the fapolicyd daemon with default config. Then > > > > as an unprivileged user, create and run a very simple binary that should > > > > be denied. Then check for an event with > > > > ausearch -m FANOTIFY -ts recent > > > > > > > > Link: https://issues.redhat.com/browse/RHEL-1367 > > > > Signed-off-by: Richard Guy Briggs > > > > > > I don't know enough about security modules to tell whether this is what > > > admins want or not so that's up to you but: > > > > > > > -static inline void audit_fanotify(u32 response, struct > > > > fanotify_response_info_audit_rule *friar) > > > > -{ > > > > - if (!audit_dummy_context()) > > > > - __audit_fanotify(response, friar); > > > > -} > > > > - > > > > > > I think this is going to break compilation with !CONFIG_AUDITSYSCALL && > > > CONFIG_FANOTIFY? > > > > Why would that break it? The part of the patch you (prematurely) > > deleted takes care of that. > > So I'm failing to see how it takes care of that when with > !CONFIG_AUDITSYSCALL kernel/auditsc.c does not get compiled into the kernel. > So what does provide the implementation of audit_fanotify() in that case? > I think you need to provide empty audit_fanotify() inline wrapper for that > case... I'm sorry, I responded too quickly without thinking about your question, my mistake. It isn't the prototype that was changed in the CONFIG_SYSCALL case that is relevant in that case. There was already in existance in the !CONFIG_AUDITSYSCALL case the inline wrapper to do that job: static inline void audit_fanotify(u32 response, struct fanotify_response_info_audit_rule *friar) { } Did I understand correctly this time and does this answer your question? But you do cause me to notice the case that these notifications will be dropped when CONFIG_AUDIT && !CONFIG_AUDITSYSCALL. Thanks for persisting. > Honza > > > diff --git a/include/linux/audit.h b/include/linux/audit.h > > index 0050ef288ab3..d0c6f23503a1 100644 > > --- a/include/linux/audit.h > > +++ b/include/linux/audit.h > > @@ -418,7 +418,7 @@ extern void __audit_log_capset(const struct cred *new, > > const struct cred *old); > > extern void __audit_mmap_fd(int fd, int flags); > > extern void __audit_openat2_how(struct open_how *how); > > extern void __audit_log_kern_module(char *name); > > -extern void __audit_fanotify(u32 response, struct > > fanotify_response_info_audit_rule *friar); > > +extern void audit_fanotify(u32 response, struct > > fanotify_response_info_audit_rule *friar); > > extern void __audit_tk_injoffset(struct timespec64 offset); > > extern void __audit_ntp_log(const struct audit_ntp_data *ad); > > extern void __audit_log_nfcfg(const char *name, u8 af, unsigned int > > nentries, > > > > > -- > > > Jan Kara > > > SUSE Labs, CR > > > > - RGB > > > -- > Jan Kara > SUSE Labs, CR - RGB -- Richard Guy Briggs Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada Upstream IRC: SunRaycer Voice: +1.613.860 2354 SMS: +1.613.518.6570
[PATCH v4 0/1] ipe: add errno field to IPE policy load auditing
Hello, When deployment of a new IPE policy fails, there is no audit trail. The failure is written to stderr, but not to the system log. So, users of IPE require a way to identify when and why an operation fails, allowing them to both respond to violations of policy and be notified of potentially malicious actions on their systems with respect to IPE. Previous Postings - v3: https://lore.kernel.org/linux-security-module/1740784265-19829-1-git-send-email-jasjivsi...@linux.microsoft.com/ v2: https://lore.kernel.org/linux-security-module/1740696377-3986-1-git-send-email-jasjivsi...@linux.microsoft.com/ v1: https://lore.kernel.org/linux-security-module/1739569319-22015-1-git-send-email-jasjivsi...@linux.microsoft.com/ Changelog - v4: * added a seperate errno table to IPE AUDIT_IPE_POLICY_LOAD documentation. * fixed error code handling that happens when memdup_user_nul is called in new_policy() and update_policy(). * added additional errno documentation to new_policy(), update_policy(), ipe_new_policy() and ipe_update_policy(). * added ENOKEY and EKEYREJECTED to IPE errno table documentation. v3: * used ERR_PTR(rc) directly rather than assigning to struct ipe_policy. * removed unnecessary var from update_policy(). * removed unnecessary error handling from update_policy(). v2: * added additional IPE audit log information to commit to show the errno case. * changed log format from AUDIT_POLICY_LOAD_NULL_FMT to AUDIT_POLICY_LOAD_FAIL_FMT. * removed unnecessary res var from ipe_audit_policy_load(). * handled security fs failure case in new_policy() and update_policy(). * handled insufficent failure case in new_policy() and update_policy(). Jasjiv Singh (1): ipe: add errno field to IPE policy load auditing Documentation/admin-guide/LSM/ipe.rst | 69 +++ security/ipe/audit.c | 21 ++-- security/ipe/fs.c | 19 ++-- security/ipe/policy.c | 11 - security/ipe/policy_fs.c | 29 --- 5 files changed, 111 insertions(+), 38 deletions(-) -- 2.34.1
[PATCH v2 0/6] Audit: Records for multiple security contexts
The Linux audit system includes LSM based security "context" information in its events. Historically, only one LSM that uses security contexts can be active on a system. One of the few obsticles to allowing multiple LSM support is the inability to report more than one security context in an audit event. This patchset provides a mechanism to provide supplimental records containing more than one security context for subjects and objects. The mechanism for reporting multiple security contexts inspired considerable discussion. It would have been possible to add multiple contexts to existing records using sophisticated formatting. This would have significant backward compatibility issues, and require additional parsing in user space code. Adding new records for an event that contain the contexts is more in keeping with the way audit events have been constructed in the past. Only audit events associated with system calls have required multiple records prior to this. Mechanism has been added allowing any event to be composed of multiple records. This should make it easier to add information to existing audit events without breaking backward compatability. v2: Maintain separate counts for LSMs using subject contexts and object contexts. AppArmor uses the former but not the latter. Correct error handling in object record creation. https://github.com/cschaufler/lsm-stacking#audit-records-multiple-contexts-v2 Casey Schaufler (6): Audit: Create audit_stamp structure Audit: Allow multiple records in an audit_buffer LSM: security_lsmblob_to_secctx module selection Audit: Add record for multiple task security contexts Audit: multiple subject lsm values for netlabel Audit: Add record for multiple object contexts include/linux/audit.h| 13 ++ include/linux/lsm_hooks.h| 4 + include/linux/security.h | 8 +- include/uapi/linux/audit.h | 2 + kernel/audit.c | 235 +-- kernel/audit.h | 13 +- kernel/auditsc.c | 65 +++--- net/netlabel/netlabel_user.c | 8 +- security/apparmor/lsm.c | 1 + security/bpf/hooks.c | 1 + security/security.c | 19 ++- security/selinux/hooks.c | 2 + security/smack/smack_lsm.c | 2 + 13 files changed, 274 insertions(+), 99 deletions(-) -- 2.47.0
[PATCH v2 1/6] Audit: Create audit_stamp structure
Replace the timestamp and serial number pair used in audit records with a structure containing the two elements. Signed-off-by: Casey Schaufler --- kernel/audit.c | 17 + kernel/audit.h | 13 + kernel/auditsc.c | 22 +- 3 files changed, 27 insertions(+), 25 deletions(-) diff --git a/kernel/audit.c b/kernel/audit.c index 5f5bf85bcc90..2a567f667528 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -1833,11 +1833,11 @@ unsigned int audit_serial(void) } static inline void audit_get_stamp(struct audit_context *ctx, - struct timespec64 *t, unsigned int *serial) + struct audit_stamp *stamp) { - if (!ctx || !auditsc_get_stamp(ctx, t, serial)) { - ktime_get_coarse_real_ts64(t); - *serial = audit_serial(); + if (!ctx || !auditsc_get_stamp(ctx, stamp)) { + ktime_get_coarse_real_ts64(&stamp->ctime); + stamp->serial = audit_serial(); } } @@ -1860,8 +1860,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type) { struct audit_buffer *ab; - struct timespec64 t; - unsigned int serial; + struct audit_stamp stamp; if (audit_initialized != AUDIT_INITIALIZED) return NULL; @@ -1916,12 +1915,14 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, return NULL; } - audit_get_stamp(ab->ctx, &t, &serial); + audit_get_stamp(ab->ctx, &stamp); /* cancel dummy context to enable supporting records */ if (ctx) ctx->dummy = 0; audit_log_format(ab, "audit(%llu.%03lu:%u): ", -(unsigned long long)t.tv_sec, t.tv_nsec/100, serial); +(unsigned long long)stamp.ctime.tv_sec, +stamp.ctime.tv_nsec/100, +stamp.serial); return ab; } diff --git a/kernel/audit.h b/kernel/audit.h index 0211cb307d30..4d6dd2588f9b 100644 --- a/kernel/audit.h +++ b/kernel/audit.h @@ -99,6 +99,12 @@ struct audit_proctitle { char*value; /* the cmdline field */ }; +/* A timestamp/serial pair to identify an event */ +struct audit_stamp { + struct timespec64 ctime; /* time of syscall entry */ + unsigned intserial; /* serial number for record */ +}; + /* The per-task audit context. */ struct audit_context { int dummy; /* must be the first element */ @@ -108,10 +114,9 @@ struct audit_context { AUDIT_CTX_URING,/* in use by io_uring */ } context; enum audit_statestate, current_state; - unsigned intserial; /* serial number for record */ + struct audit_stamp stamp; /* event identifier */ int major; /* syscall number */ int uring_op; /* uring operation */ - struct timespec64 ctime; /* time of syscall entry */ unsigned long argv[4];/* syscall arguments */ longreturn_code;/* syscall return code */ u64 prio; @@ -263,7 +268,7 @@ extern void audit_put_tty(struct tty_struct *tty); extern unsigned int audit_serial(void); #ifdef CONFIG_AUDITSYSCALL extern int auditsc_get_stamp(struct audit_context *ctx, - struct timespec64 *t, unsigned int *serial); +struct audit_stamp *stamp); extern void audit_put_watch(struct audit_watch *watch); extern void audit_get_watch(struct audit_watch *watch); @@ -304,7 +309,7 @@ extern void audit_filter_inodes(struct task_struct *tsk, struct audit_context *ctx); extern struct list_head *audit_killed_trees(void); #else /* CONFIG_AUDITSYSCALL */ -#define auditsc_get_stamp(c, t, s) 0 +#define auditsc_get_stamp(c, s) 0 #define audit_put_watch(w) do { } while (0) #define audit_get_watch(w) do { } while (0) #define audit_to_watch(k, p, l, o) (-EINVAL) diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 9c853cde9abe..2ec3a0d85447 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -994,10 +994,10 @@ static void audit_reset_context(struct audit_context *ctx) */ ctx->current_state = ctx->state; - ctx->serial = 0; + ctx->stamp.serial = 0; ctx->major = 0; ctx->uring_op = 0; - ctx->ctime = (struct timespec64){ .tv_sec = 0, .tv_nsec = 0 }; + ctx->stamp.ctime = (struct timespec64){ .tv_sec = 0, .tv_nsec = 0 }; memset(ctx->argv, 0, sizeof(ctx->argv)); ctx->return_code = 0; ctx->prio = (ctx->state == AUDIT_STATE_RECORD ? ~0ULL : 0); @@ -1917,7 +1917,7 @@ void __audit_uring_entry(u8 op) ctx->context = AUDIT_CTX_URING; ctx->current_state = ctx->st
Re: [PATCH v1] audit,module: restore audit logging in load failure case
On Thursday, March 6, 2025 4:41:40 PM Eastern Standard Time Richard Guy Briggs wrote: > On 2024-10-24 16:41, Paul Moore wrote: > > On Oct 23, 2024 Richard Guy Briggs wrote: > > > The move of the module sanity check to earlier skipped the audit > > > logging > > > call in the case of failure and to a place where the previously used > > > context is unavailable. > > > > > > Add an audit logging call for the module loading failure case and get > > > the module name when possible. > > > > > > Link: https://issues.redhat.com/browse/RHEL-52839 > > > Fixes: 02da2cbab452 ("module: move check_modinfo() early to > > > early_mod_check()") Signed-off-by: Richard Guy Briggs > > > --- > > > > > > kernel/module/main.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/module/main.c b/kernel/module/main.c > > > index 49b9bca9de12..1f482532ef66 100644 > > > --- a/kernel/module/main.c > > > +++ b/kernel/module/main.c > > > @@ -3057,8 +3057,10 @@ static int load_module(struct load_info *info, > > > const char __user *uargs,> > > > >* failures once the proper module was allocated and > > >* before that. > > >*/ > > > > > > - if (!module_allocated) > > > + if (!module_allocated) { > > > + audit_log_kern_module(info->name ? info->name : "(unavailable)"); > > > > > > mod_stat_bump_becoming(info, flags); > > > > > > + } > > > > We probably should move the existing audit_log_kern_module() to just > > after the elf_validity_cache_copy() call as both info->name and > > info->mod->name should be as valid as they are going to get at that > > point. If we do that then we only have two cases we need to worry about, > > a failed module_sig_check() or a failed elf_validity_cache_copy(), and > > in both cases we can use "(unavailable)" without having to check > > info->name first. > > Fair enough. > > > However, assuming we move the audit_log_kern_module() call up a bit as > > described above, I'm not sure there is much value in calling > > audit_log_kern_module() with an "(unavailable)" module name in those > > early two cases. We know it's an attempted module load based on the > > SYSCALL record, seeing an associated "(unavailable)" KERN_MODULE record > > doesn't provide us with any more information than if we had simply > > skipped the KERN_MODULE record. > > Understood. I wonder if the absence of the record in the error case > will leave us guessing if we lost a record from the event? We will have > the error code from the SYSCALL record but not much more than that, and > some of those error codes could just as well be generated after that > point too. This would be a similar situation to the vanishing fields in > an existing record, but is likely easier to mitigate than a > non-last-field vanishing or shifting around in an existing record. > > Steve? Atilla? Any comments? We should only get the events if we have syscall rules that would result in kernel module loading/unloading events. I suppose that by the time audit loads it's rules, many modules have already loaded. So, I don't think we need to worry about early cases. But when we do get a kernel module load/unload event based on the rules, we need it's name - even in failures. We need to be able to determine what was attempted since this could affect system integrity. -Steve > > Untested, but this is what I'm talking about: > > > > diff --git a/include/linux/audit.h b/include/linux/audit.h > > index 0050ef288ab3..eaa10e3c7eca 100644 > > --- a/include/linux/audit.h > > +++ b/include/linux/audit.h > > @@ -417,7 +417,7 @@ extern int __audit_log_bprm_fcaps(struct linux_binprm > > *bprm,> > > extern void __audit_log_capset(const struct cred *new, const struct cred > > *old); extern void __audit_mmap_fd(int fd, int flags); > > extern void __audit_openat2_how(struct open_how *how); > > > > -extern void __audit_log_kern_module(char *name); > > +extern void __audit_log_kern_module(const char *name); > > > > extern void __audit_fanotify(u32 response, struct > > fanotify_response_info_audit_rule *friar); extern void > > __audit_tk_injoffset(struct timespec64 offset); > > extern void __audit_ntp_log(const struct audit_ntp_data *ad); > > > > @@ -519,7 +519,7 @@ static inline void audit_openat2_how(struct open_how > > *how)> > > __audit_openat2_how(how); > > > > } > > > > -static inline void audit_log_kern_module(char *name) > > +static inline void audit_log_kern_module(const char *name) > > > > { > > > > if (!audit_dummy_context()) > > > > __audit_log_kern_module(name); > > > > @@ -677,7 +677,7 @@ static inline void audit_mmap_fd(int fd, int flags) > > > > static inline void audit_openat2_how(struct open_how *how) > > { } > > > > -static inline void audit_log_kern_module(char *name) > > +static inline void audit_log_kern_module(const char *name) > > > > { > > } > > > > diff --git a/kernel/audit.h b/kernel/audi