Am 19.05.22 um 23:40 schrieb Kalesh Singh:
Processes can pin shared memory by keeping a handle to it through a
file descriptor; for instance dmabufs, memfd, and ashsmem (in Android).

In the case of a memory leak, to identify the process pinning the
memory, userspace needs to:
   - Iterate the /proc/<pid>/fd/* for each process
   - Do a readlink on each entry to identify the type of memory from
     the file path.
   - stat() each entry to get the size of the memory.

The file permissions on /proc/<pid>/fd/* only allows for the owner
or root to perform the operations above; and so is not suitable for
capturing the system-wide state in a production environment.

This issue was addressed for dmabufs by making /proc/*/fdinfo/*
accessible to a process with PTRACE_MODE_READ_FSCREDS credentials[1]
To allow the same kind of tracking for other types of shared memory,
add the following fields to /proc/<pid>/fdinfo/<fd>:

path - This allows identifying the type of memory based on common
        prefixes: e.g. "/memfd...", "/dmabuf...", "/dev/ashmem..."

        This was not an issued when dmabuf tracking was introduced
        because the exp_name field of dmabuf fdinfo could be used
        to distinguish dmabuf fds from other types.

size - To track the amount of memory that is being pinned.

        dmabufs expose size as an additional field in fdinfo. Remove
        this and make it a common field for all fds.

Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
-- the same as for /proc/<pid>/maps which also exposes the path and
size for mapped memory regions.

This allows for a system process with PTRACE_MODE_READ_FSCREDS to
account the pinned per-process memory via fdinfo.

I think this should be split into two patches, one adding the size and one adding the path.

Adding the size is completely unproblematic, but the path might raise some eyebrows.


[1] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20210308170651.919148-1-kaleshsingh%40google.com%2F&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C95ee7bf71c2c4aa342fa08da39e03398%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637885932392014544%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=kf%2B2es12hV3z5zjOFhx3EyxI1XEMeHexqTLNpNoDhAY%3D&amp;reserved=0

Signed-off-by: Kalesh Singh <kaleshsi...@google.com>
---
  Documentation/filesystems/proc.rst | 22 ++++++++++++++++++++--
  drivers/dma-buf/dma-buf.c          |  1 -
  fs/proc/fd.c                       |  9 +++++++--
  3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/proc.rst 
b/Documentation/filesystems/proc.rst
index 061744c436d9..ad66d78aca51 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1922,13 +1922,16 @@ if precise results are needed.
  3.8   /proc/<pid>/fdinfo/<fd> - Information about opened file
  ---------------------------------------------------------------
  This file provides information associated with an opened file. The regular
-files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
+files have at least six fields -- 'pos', 'flags', 'mnt_id', 'ino', 'size',
+and 'path'.
+
  The 'pos' represents the current offset of the opened file in decimal
  form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
  file has been created with [see open(2) for details] and 'mnt_id' represents
  mount ID of the file system containing the opened file [see 3.5
  /proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
-the file.
+the file, 'size' represents the size of the file in bytes, and 'path'
+represents the file path.
A typical output is:: @@ -1936,6 +1939,8 @@ A typical output is::
        flags:  0100002
        mnt_id: 19
        ino:    63107
+        size:   0
+        path:   /dev/null
All locks associated with a file descriptor are shown in its fdinfo too:: @@ -1953,6 +1958,8 @@ Eventfd files
        flags:  04002
        mnt_id: 9
        ino:    63107
+        size:   0
+        path:   anon_inode:[eventfd]
        eventfd-count:  5a
where 'eventfd-count' is hex value of a counter.
@@ -1966,6 +1973,8 @@ Signalfd files
        flags:  04002
        mnt_id: 9
        ino:    63107
+        size:   0
+        path:   anon_inode:[signalfd]
        sigmask:        0000000000000200
where 'sigmask' is hex value of the signal mask associated
@@ -1980,6 +1989,8 @@ Epoll files
        flags:  02
        mnt_id: 9
        ino:    63107
+        size:   0
+        path:   anon_inode:[eventpoll]
        tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af 
sdev:7
where 'tfd' is a target file descriptor number in decimal form,
@@ -1998,6 +2009,8 @@ For inotify files the format is the following::
        flags:  02000000
        mnt_id: 9
        ino:    63107
+        size:   0
+        path:   anon_inode:inotify
        inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 
fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
where 'wd' is a watch descriptor in decimal form, i.e. a target file
@@ -2021,6 +2034,8 @@ For fanotify files the format is::
        flags:  02
        mnt_id: 9
        ino:    63107
+        size:   0
+        path:   anon_inode:[fanotify]
        fanotify flags:10 event-flags:0
        fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
        fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 
fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
@@ -2046,6 +2061,8 @@ Timerfd files
        flags:  02
        mnt_id: 9
        ino:    63107
+        size:   0
+        path:   anon_inode:[timerfd]
        clockid: 0
        ticks: 0
        settime flags: 01
@@ -2070,6 +2087,7 @@ DMA Buffer files
        mnt_id: 9
        ino:    63107
        size:   32768
+        path:   /dmabuf:
        count:  2
        exp_name:  system-heap
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b1e25ae98302..d61183ff3c30 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -377,7 +377,6 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct 
file *file)
  {
        struct dma_buf *dmabuf = file->private_data;
- seq_printf(m, "size:\t%zu\n", dmabuf->size);
        /* Don't count the temporary reference taken inside procfs seq_show */
        seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
        seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name);
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 913bef0d2a36..a8a968bc58f0 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -54,10 +54,15 @@ static int seq_show(struct seq_file *m, void *v)
        if (ret)
                return ret;
- seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
+       seq_printf(m, 
"pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\nsize:\t%zu\n",
                   (long long)file->f_pos, f_flags,
                   real_mount(file->f_path.mnt)->mnt_id,
-                  file_inode(file)->i_ino);
+                  file_inode(file)->i_ino,
+                  file_inode(file)->i_size);

We might consider splitting this into multiple seq_printf calls, one for each printed attribute.

It becomes a bit unreadable and the minimal additional overhead shouldn't matter that much.

Regards,
Christian.

+
+       seq_puts(m, "path:\t");
+       seq_file_path(m, file, "\n");
+       seq_putc(m, '\n');
/* show_fd_locks() never deferences files so a stale value is safe */
        show_fd_locks(m, file, files);

base-commit: b015dcd62b86d298829990f8261d5d154b8d7af5

Reply via email to