On 2025-08-05, Aleksa Sarai wrote:
> Since the introduction of pid namespaces, their interaction with procfs
> has been entirely implicit in ways that require a lot of dancing around
> by programs that need to construct sandboxes with different PID
> namespaces.
>
> Being
ainst all processes
in the pidns (which is very likely to be true for at least one
process, as SUID_DUMP_DISABLE is cleared on exec(2) and is rarely set
by most programs), but this would obviously not scale.
I'm open to suggestions for whether we need to make this stricter (or
possibl
Signed-off-by: Aleksa Sarai
---
tools/testing/selftests/proc/.gitignore | 1 +
tools/testing/selftests/proc/Makefile | 1 +
tools/testing/selftests/proc/proc-pidns.c | 315 ++
3 files changed, 317 insertions(+)
diff --git a/tools/testing/selftests/proc
Creating new procfs instances is
quite cheap, so this should not be an impediment to most users, and lets
us avoid a lot of churn in fs/proc/* for a feature that it seems
unlikely userspace would use.
Signed-off-by: Aleksa Sarai
---
Documentation/filesystems/proc.rst | 8
fs/proc/root.c
eing able to
open a pidns handle doesn't really provide too many other capabilities.
Signed-off-by: Aleksa Sarai
---
Documentation/filesystems/proc.rst | 4 +++
fs/proc/root.c | 68 --
include/uapi/linux/fs.h| 4 +++
3 file
This check will be needed in later patches, and there's no point
open-coding it each time.
Signed-off-by: Aleksa Sarai
---
include/linux/pid_namespace.h | 9 +
kernel/pid_namespace.c| 22 ++
2 files changed, 23 insertions(+), 8 deletions(-)
diff --
On 2025-07-31, Christian Brauner wrote:
> On Fri, Jul 25, 2025 at 12:24:28PM +1000, Aleksa Sarai wrote:
> > On 2025-07-24, Christian Brauner wrote:
> > > On Wed, Jul 23, 2025 at 09:18:53AM +1000, Aleksa Sarai wrote:
> > > > /proc has historically had very opaque
On 2025-07-24, Christian Brauner wrote:
> On Wed, Jul 23, 2025 at 09:18:53AM +1000, Aleksa Sarai wrote:
> > /proc has historically had very opaque semantics about PID namespaces,
> > which is a little unfortunate for container runtimes and other programs
> > that deal with
On 2025-07-24, Christian Brauner wrote:
> On Wed, Jul 23, 2025 at 09:18:52AM +1000, Aleksa Sarai wrote:
> > Since the introduction of pid namespaces, their interaction with procfs
> > has been entirely implicit in ways that require a lot of dancing around
> > by programs t
Signed-off-by: Aleksa Sarai
---
tools/testing/selftests/proc/.gitignore | 1 +
tools/testing/selftests/proc/Makefile | 1 +
tools/testing/selftests/proc/proc-pidns.c | 252 ++
3 files changed, 254 insertions(+)
diff --git a/tools/testing/selftests/proc
eir own permission checks,
so being able to open a pidns handle doesn't really provide too many
other capabilities.
Signed-off-by: Aleksa Sarai
---
Documentation/filesystems/proc.rst | 4 +++
fs/proc/root.c | 54 --
include/uapi/lin
Creating new procfs instances is
quite cheap, so this should not be an impediment to most users, and lets
us avoid a lot of churn in fs/proc/* for a feature that it seems
unlikely userspace would use.
Signed-off-by: Aleksa Sarai
---
Documentation/filesystems/proc.rst | 8 +++
fs/proc/root.c
This check will be needed in later patches, and there's no point
open-coding it each time.
Signed-off-by: Aleksa Sarai
---
include/linux/pid_namespace.h | 9 +
kernel/pid_namespace.c| 23 +++
2 files changed, 24 insertions(+), 8 deletions(-)
diff --
he current process's pidns).
The security model for this is a little loose, as it seems to me that
all of the cases mentioned are valid cases to allow access, but I'm open
to suggestions for whether we need to make this stricter or looser.
Signed-off-by: Aleksa Sarai
---
Changes in v3:
On 2025-07-22, Aleksa Sarai wrote:
> On 2025-07-21, Andy Lutomirski wrote:
> > On Mon, Jul 21, 2025 at 1:44 AM Aleksa Sarai wrote:
> > >
> > > Ever since the introduction of pid namespaces, procfs has had very
> > > implicit behaviour surrounding them
Signed-off-by: Aleksa Sarai
---
tools/testing/selftests/proc/.gitignore | 1 +
tools/testing/selftests/proc/Makefile | 1 +
tools/testing/selftests/proc/proc-pidns.c | 286 ++
3 files changed, 288 insertions(+)
diff --git a/tools/testing/selftests/proc
eir own permission checks,
so being able to open a pidns handle doesn't really provide too many
other capabilities.
Signed-off-by: Aleksa Sarai
---
Documentation/filesystems/proc.rst | 4 +++
fs/proc/root.c | 54 --
include/uapi/lin
his mirrors pidns_install() to avoid
opening up new attack surfaces by loosening the existing permission
model.
Note that the mount infrastructure also allows userspace to reconfigure
the pidns of an existing procfs mount, which may or may not be useful to
some users.
Signed-off-by
This check will be needed in later patches, and there's no point
open-coding it each time.
Signed-off-by: Aleksa Sarai
---
include/linux/pid_namespace.h | 9 +
kernel/pid_namespace.c| 23 +++
2 files changed, 24 insertions(+), 8 deletions(-)
diff --
ndy Lutomirski]
- Fix build warnings in pidns_is_ancestor() patch. [kernel test robot]
- v1:
<https://lore.kernel.org/r/20250721-procfs-pidns-api-v1-0-5cd9007e5...@cyphar.com>
---
Aleksa Sarai (4):
pidns: move is-ancestor logic to helper
procfs: add "pidns" mount option
On 2025-07-21, Andy Lutomirski wrote:
> On Mon, Jul 21, 2025 at 1:44 AM Aleksa Sarai wrote:
> >
> > Ever since the introduction of pid namespaces, procfs has had very
> > implicit behaviour surrounding them (the pidns used by a procfs mount is
> > auto-selected base
Signed-off-by: Aleksa Sarai
---
tools/testing/selftests/proc/.gitignore | 1 +
tools/testing/selftests/proc/Makefile | 1 +
tools/testing/selftests/proc/proc-pidns.c | 286 ++
3 files changed, 288 insertions(+)
diff --git a/tools/testing/selftests/proc
with /proc instances.
Signed-off-by: Aleksa Sarai
---
Documentation/filesystems/proc.rst | 4 +++
fs/proc/root.c | 52 --
include/uapi/linux/fs.h| 3 +++
3 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/Document
procfs super block will allow programs to no longer need
to fork off a process which does then does unshare(2) / setns(2) and
forks again in order to construct a procfs in a pidns.
Signed-off-by: Aleksa Sarai
---
Documentation/filesystems/proc.rst | 6 +++
fs/proc/root.c
This check will be needed in later patches, and there's no point
open-coding it each time.
Signed-off-by: Aleksa Sarai
---
include/linux/pid_namespace.h | 9 +
kernel/pid_namespace.c| 21 ++---
2 files changed, 23 insertions(+), 7 deletions(-)
diff --
e valid cases to allow access, but I'm open
to suggestions for whether we need to make this stricter or looser.
Signed-off-by: Aleksa Sarai
---
Aleksa Sarai (4):
pidns: move is-ancestor logic to helper
procfs: add pidns= mount option
procfs: add PROCFS_GET_PID_NAMESPACE ioctl
rmation on how to ensure that syscalls with structure
> > arguments are extensible and add a section about naming conventions to
> > follow when adding revised versions of already existing syscalls.
> >
> > Co-Developed-by: Aleksa Sarai
> > Signed
port for it in 2016).
I agree with your overall point, but it should be noted that the vast
majority of Linux systems these days have protections against this (by
default) that use the pids cgroup controller.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
signature.asc
Description: PGP signature
ork anymore -- at least not like it did in my original
patch. So I'm really not sure where to go from here.
I can send around another patchset to illustrate the problem if you like
(as well as show how the current unwinding code works).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
signature.asc
Description: PGP signature
On 2018-11-09, Masami Hiramatsu wrote:
> On Thu, 8 Nov 2018 08:44:37 -0600
> Josh Poimboeuf wrote:
>
> > On Thu, Nov 08, 2018 at 07:04:48PM +1100, Aleksa Sarai wrote:
> > > On 2018-11-08, Aleksa Sarai wrote:
> > > > I will attach what I have at the
ed a quick impl
that I could test). I will fix this, thanks!
By is_kretprobe_handler_context() I imagine you are referring to
checking is_kretprobe(current_kprobe())?
> So, we should check we are in the kretprobe handler context if tsk == current,
> if not, we definately can lock the hash lock witho
On 2018-11-08, Aleksa Sarai wrote:
> I will attach what I have at the moment to hopefully explain what the
> issue I've found is (re-using the kretprobe architecture but with the
> shadow-stack idea).
Here is the patch I have at the moment (it works, except for the
question I hav
On 2018-11-06, Steven Rostedt wrote:
> On Sun, 4 Nov 2018 22:59:13 +1100
> Aleksa Sarai wrote:
>
> > The same issue is present in __save_stack_trace
> > (arch/x86/kernel/stacktrace.c). This is likely the only reason that --
> > as Steven said -- stacktraces wouldn
On 2018-11-03, Aleksa Sarai wrote:
> This is actually a general bug in ftrace as well, because (for
> instance) while the unwinder calls into ftrace_graph_ret_addr() while
> walking up the stack this isn't used to correct regs->ip in
> perf_callchain_kernel(). I think this is
gs)
doesn't give the right result for some reason).
I will try to fix it up and attach it, but if you're already working on
a prototype the unifies all the users that works too. The patch I have
at the moment duplicates each of the key ftrace_graph_return_addr
invocations with a matching kretprobe_return_addr (though there's only
three of these).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
signature.asc
Description: PGP signature
On 2018-11-02, Aleksa Sarai wrote:
> Unfortunately, I'm having a lot of trouble understanding how the current
> ftrace hooking works -- ORC has a couple of ftrace hooks that seem
> reasonable on the surface but I don't understand (for instance) how
> HAVE_FUNCTION_GRAPH_
On 2018-11-02, Aleksa Sarai wrote:
> For kretprobes I think it would be fairly easy to reconstruct what
> landed you into a kretprobe_trampoline by walking the set of
> kretprobe_instances (since all new ones are added to the head, you can
> get the real return address in-order).
>
On 2018-11-01, Steven Rostedt wrote:
> On Thu, 1 Nov 2018 19:35:50 +1100
> Aleksa Sarai wrote:
> > @@ -1834,6 +1853,11 @@ static int pre_handler_kretprobe(struct kprobe *p,
> > struct pt_regs *regs)
> > ri->rp = rp;
> &g
On 2018-11-02, Masami Hiramatsu wrote:
> On Fri, 2 Nov 2018 08:13:43 +1100
> Aleksa Sarai wrote:
>
> > On 2018-11-02, Masami Hiramatsu wrote:
> > > Please split the test case as an independent patch.
> >
> > Will do. Should the Documentation/ change also b
obe_trampoline. This is something I'm going
to look into some more (despite not having made progress on it last
time) since now it's something that actually needs to be fixed (and
as I mentioned in the other thread, show_stack() actually works on x86
in this context unlike the other stack_trace users).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
signature.asc
Description: PGP signature
ce (switch
away from hlist_for_each_entry_safe)
* kprobe: make maximum stack size 127, which is the ftrace default
Aleksa Sarai (2):
kretprobe: produce sane stack traces
trace: remove kretprobed checks
Documentation/kprobes.txt | 6 +-
include/linux/kprobes.h
This is effectively a reversion of commit 76094a2cf46e ("ftrace:
distinguish kretprobe'd functions in trace logs"), as the checking of
kretprobe_trampoline *for tracing* is no longer necessary with the new
kretprobe stack trace changes.
Signed-off-by: Aleksa Sarai
--
entry stack trace when it is requested.
[1]: https://github.com/iovisor/bpftrace/issues/101
Cc: Brendan Gregg
Cc: Christian Brauner
Signed-off-by: Aleksa Sarai
---
Documentation/kprobes.txt | 6 +-
include/linux/kprobes.h | 27 +
kernel/e
entry stack trace when it is requested.
[1]: https://github.com/iovisor/bpftrace/issues/101
Cc: Brendan Gregg
Cc: Christian Brauner
Signed-off-by: Aleksa Sarai
---
Documentation/kprobes.txt | 6 +-
include/linux/kprobes.h | 15 +++
kernel/events/callch
This is effectively a reversion of commit 76094a2cf46e ("ftrace:
distinguish kretprobe'd functions in trace logs"), as the checking of
kretprobe_trampoline *for tracing* is no longer necessary with the new
kretprobe stack trace changes.
Signed-off-by: Aleksa Sarai
--
* kprobe: make maximum stack size 127, which is the ftrace default
(I forgot to Cc the BPF folks in v1, I've added them now.)
Aleksa Sarai (2):
kretprobe: produce sane stack traces
trace: remove kretprobed checks
Documentation/kprobes.txt |
uot;normal" OOM? Is there some peculiarity about
memcg OOM that I'm missing?
--
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vge
On Sun, Jun 26, 2016 at 09:34:41PM +1000, Aleksa Sarai wrote:
If a user has a setup where they wait for notifications on changes to
pids.event, and then auto-adjust the cgroup limits based on the number of
failures you have a race condition between reading the pids.event file and
then setting
On Fri, Jun 24, 2016 at 01:00:48PM +1000, Aleksa Sarai wrote:
This allows users to dynamically adjust their limits based on how many
failed forks happened since they last reset their limits, otherwise they
would have to track (in a racy way) how many limit failures there were
since the last
set-we-log-failures-again semantics that Tejun said he liked.
Aleksa Sarai (2):
cgroup: pids: show number of failed forks since limit reset
docs: cgroup/pids: update documentation to include pids.events
Documentation/cgroup-v1/pids.txt | 18 ++
kernel/cgroup_pids.c
So that users know what the interface and meaning of the keyed values
are. In addition, mention that the only time that since=0 is when the
limit was changed.
Signed-off-by: Aleksa Sarai
---
Documentation/cgroup-v1/pids.txt | 18 ++
1 file changed, 18 insertions(+)
diff --git a
the limit was reset (which was the original semantics of
the patchset).
In addition, I clarified the licensing for this file.
Signed-off-by: Aleksa Sarai
---
kernel/cgroup_pids.c | 31 ++-
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/kernel
52 matches
Mail list logo