Re: [lttng-dev] 回复: 回复: 回复: 回复: shm leak in traced application?
When this happpens, is the process holding a single (or very few) shm file references, or references to many shm files ? I wonder if you end up in a scenario where an application very frequently performs exec(), and therefore sometimes the exec() will happen in the window between the unix socket file descriptor reception and call to fcntl FD_CLOEXEC. Thanks, Mathieu - On Mar 8, 2022, at 8:29 PM, zhenyu.ren wrote: > Thanks a lot for reply. I do not reply it in bug tracker since I have not > gotten > a reliable way to reproduce the leak case. >> -- >> 发件人:Mathieu Desnoyers >> 发送时间:2022年3月8日(星期二) 23:26 >> 收件人:zhenyu.ren >> 抄 送:Jonathan Rajotte ; lttng-dev >> >> 主 题:Re: [lttng-dev] 回复: 回复: 回复: shm leak in traced application? >> - On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev@lists.lttng.org wrote: >> > Hi, >> > In shm_object_table_append_shm()/alloc_shm(), why not calling FD_CLOEXEC >> > fcntl() >> > to shmfds? I guess this omission leads to shm fds leak. >> Those file descriptors are created when received by >> ustcomm_recv_fds_unix_sock, >> and >> immediately after creation they are set as FD_CLOEXEC. >> We should continue this discussion in the bug tracker as suggested by >> Jonathan. >> It would greatly help if you can provide a small reproducer. >> Thanks, >> Mathieu >> > Thanks >> > zhenyu.ren >> >> -- >> >> 发件人:Jonathan Rajotte-Julien >> >> 发送时间:2022年2月25日(星期五) 22:31 >> >> 收件人:zhenyu.ren >> >> 抄 送:lttng-dev >> >> 主 题:Re: [lttng-dev] 回复: 回复: shm leak in traced application? >> >> Hi zhenyu.ren, >> >> Please open a bug on our bug tracker and provide a reproducer against the >> >> latest >> >> stable version (2.13.x). >> >> https://bugs.lttng.org/ >> >> Please follow the guidelines: >> >> https://bugs.lttng.org/#Bug-reporting-guidelines >> >> Cheers >> >> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote: >> >> > Hi, lttng-dev team >> >>> When lttng-sessiond exits, the ust applications should call >> >>> lttng_ust_objd_table_owner_cleanup() and clean up all shm resource(unmap >> >>> and >> >>> close). Howerver I do find that the ust applications keep opening "all" >> >>> of the >> >> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free >> >> > shm. >> >>> If we run lttng-sessiond again, ust applications can get a new piece of >> >>> shm and >> >>> a new list of shm fds so double shm usages. Then if we kill >> >>> lttng-sessiond, >> >>> what the mostlikely happened is ust applications close the new list of >> >>> shm fds >> >>> and free new shm resource but keeping old shm still. In other word, we >> >>> can not >> >> > free this piece of shm unless we killing ust applications!!! >> >>> So Is there any possilbe that ust applications failed calling >> >>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this problem? >> >>> Do you >> >>> have any advice to free the shm without killling ust applications(I >> >>> tried to >> >> > dig into kernel shm_open and /dev/shm, but not found any ideas)? >> >> > Thanks in advance >> >> > zhenyu.ren >> >> > -- >> >> > 发件人:zhenyu.ren via lttng-dev >> >> > 发送时间:2022年2月23日(星期三) 23:09 >> >> > 收件人:lttng-dev >> >> > 主 题:[lttng-dev] 回复: shm leak in traced application? >> >>> >"I found these items also exist in a traced application which is a >> >>> >long-time >> >> > >running daemon" >> >> > Even if lttng-sessiond has been killed!! >> >> > Thanks >> >> > zhenyu.ren >> >> > -- >> >> > 发件人:zhenyu.ren via lttng-dev >> >> > 发送时间:2022年2月23日(星期三) 22:44 >> >> > 收件人:lttng-dev >> >> > 主 题:[lttng-dev] shm leak in traced application? >> >> > Hi, >> >>> There are many items such as "/dev/shm/ust-shm-consumer-81132 (deleted)" >> >>> exist >> >>> in lttng-sessiond fd spaces. I know it is the result of shm_open() and >> >> > shm_unlnik() in create_posix_shm(). >> >>> However, today, I found these items also exist in a traced application >> >>> which is >> >>> a long-time running daemon. The most important thing I found is that >> >>> there >> >> > seems no reliable way to release share memory. >> >>> I tried to kill lttng-sessiond but not always release share memory. >> >>> Sometimes I >> >>> need to kill the traced application to free share memoryBut it is >> >>> not a >> >> > good idea to kill these applications. >> >> > My questions are: >> >>> 1. Is there any way to release share memory without killing any traced >> >> > application? >> >>> 2. Is it normal that many items such as "/dev/shm/ust-shm-consumer-81132 >> >> > (deleted)" exist in the traced application? >> >> > Thanks >> >> > zhenyu.ren >> >> > ___ >> >> > lttng-dev mailing list >> >> > lttng-dev@lists.
Re: [lttng-dev] 回复: 回复: 回复: 回复: shm leak in traced application?
Hi Zhenyu, Can you try this fix please ? https://review.lttng.org/c/lttng-ust/+/7530 And let me know how it goes. Thanks, Mathieu - On Mar 9, 2022, at 11:37 AM, Mathieu Desnoyers wrote: > When this happpens, is the process holding a single (or very few) shm file > references, or references to many > shm files ? > I wonder if you end up in a scenario where an application very frequently > performs exec(), and therefore > sometimes the exec() will happen in the window between the unix socket file > descriptor reception and > call to fcntl FD_CLOEXEC. > Thanks, > Mathieu > - On Mar 8, 2022, at 8:29 PM, zhenyu.ren wrote: >> Thanks a lot for reply. I do not reply it in bug tracker since I have not >> gotten >> a reliable way to reproduce the leak case. >>> -- >>> 发件人:Mathieu Desnoyers >>> 发送时间:2022年3月8日(星期二) 23:26 >>> 收件人:zhenyu.ren >>> 抄 送:Jonathan Rajotte ; lttng-dev >>> >>> 主 题:Re: [lttng-dev] 回复: 回复: 回复: shm leak in traced application? >>> - On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev@lists.lttng.org >>> wrote: >>> > Hi, >>> > In shm_object_table_append_shm()/alloc_shm(), why not calling FD_CLOEXEC >>> > fcntl() >>> > to shmfds? I guess this omission leads to shm fds leak. >>> Those file descriptors are created when received by >>> ustcomm_recv_fds_unix_sock, >>> and >>> immediately after creation they are set as FD_CLOEXEC. >>> We should continue this discussion in the bug tracker as suggested by >>> Jonathan. >>> It would greatly help if you can provide a small reproducer. >>> Thanks, >>> Mathieu >>> > Thanks >>> > zhenyu.ren >>> >> -- >>> >> 发件人:Jonathan Rajotte-Julien >>> >> 发送时间:2022年2月25日(星期五) 22:31 >>> >> 收件人:zhenyu.ren >>> >> 抄 送:lttng-dev >>> >> 主 题:Re: [lttng-dev] 回复: 回复: shm leak in traced application? >>> >> Hi zhenyu.ren, >>> >> Please open a bug on our bug tracker and provide a reproducer against >>> >> the latest >>> >> stable version (2.13.x). >>> >> https://bugs.lttng.org/ >>> >> Please follow the guidelines: >>> >> https://bugs.lttng.org/#Bug-reporting-guidelines >>> >> Cheers >>> >> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote: >>> >> > Hi, lttng-dev team >>> >>> When lttng-sessiond exits, the ust applications should call >>> >>> lttng_ust_objd_table_owner_cleanup() and clean up all shm >>> >>> resource(unmap and >>> >>> close). Howerver I do find that the ust applications keep opening "all" >>> >>> of the >>> >> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free >>> >> > shm. >>> >>> If we run lttng-sessiond again, ust applications can get a new piece of >>> >>> shm and >>> >>> a new list of shm fds so double shm usages. Then if we kill >>> >>> lttng-sessiond, >>> >>> what the mostlikely happened is ust applications close the new list of >>> >>> shm fds >>> >>> and free new shm resource but keeping old shm still. In other word, we >>> >>> can not >>> >> > free this piece of shm unless we killing ust applications!!! >>> >>> So Is there any possilbe that ust applications failed calling >>> >>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this >>> >>> problem? Do you >>> >>> have any advice to free the shm without killling ust applications(I >>> >>> tried to >>> >> > dig into kernel shm_open and /dev/shm, but not found any ideas)? >>> >> > Thanks in advance >>> >> > zhenyu.ren >>> >> > -- >>> >> > 发件人:zhenyu.ren via lttng-dev >>> >> > 发送时间:2022年2月23日(星期三) 23:09 >>> >> > 收件人:lttng-dev >>> >> > 主 题:[lttng-dev] 回复: shm leak in traced application? >>> >>> >"I found these items also exist in a traced application which is a >>> >>> >long-time >>> >> > >running daemon" >>> >> > Even if lttng-sessiond has been killed!! >>> >> > Thanks >>> >> > zhenyu.ren >>> >> > -- >>> >> > 发件人:zhenyu.ren via lttng-dev >>> >> > 发送时间:2022年2月23日(星期三) 22:44 >>> >> > 收件人:lttng-dev >>> >> > 主 题:[lttng-dev] shm leak in traced application? >>> >> > Hi, >>> >>> There are many items such as "/dev/shm/ust-shm-consumer-81132 >>> >>> (deleted)" exist >>> >>> in lttng-sessiond fd spaces. I know it is the result of shm_open() and >>> >> > shm_unlnik() in create_posix_shm(). >>> >>> However, today, I found these items also exist in a traced application >>> >>> which is >>> >>> a long-time running daemon. The most important thing I found is that >>> >>> there >>> >> > seems no reliable way to release share memory. >>> >>> I tried to kill lttng-sessiond but not always release share memory. >>> >>> Sometimes I >>> >>> need to kill the traced application to free share memoryBut it is >>> >>> not a >>> >> > good idea to kill these applications. >>> >> > My questions are: >>> >>> 1. Is there any way to release share memory withou
[lttng-dev] 回复:回复: 回复: 回复: 回复: shm leak in traced application?
>When this happpens, is the process holding a single (or very few) shm file >references, or references to many shm files ? It is holding "all" of shm files' reference , neither a single one nor some few ones. In fact, yesterday, I tried to fix it as the following and it seems work. --- a/lttng-ust/libringbuffer/shm.c +++ b/lttng-ust/libringbuffer/shm.c @@ -32,7 +32,6 @@ #include #include #include - /* * Ensure we have the required amount of space available by writing 0 * into the entire buffer. Not doing so can trigger SIGBUS when going @@ -122,6 +121,12 @@ struct shm_object *_shm_object_table_alloc_shm(struct shm_object_table *table, /* create shm */ shmfd = stream_fd; +if (shmfd >= 0) { + ret = fcntl(shmfd, F_SETFD, FD_CLOEXEC); + if (ret < 0) { + PERROR("fcntl shmfd FD_CLOEXEC"); + } +} ret = zero_file(shmfd, memory_map_size); if (ret) { PERROR("zero_file"); @@ -272,15 +277,22 @@ struct shm_object *shm_object_table_append_shm(struct shm_object_table *table, obj->shm_fd = shm_fd; obj->shm_fd_ownership = 1; +if (shm_fd >= 0) { + ret = fcntl(shm_fd, F_SETFD, FD_CLOEXEC); + if (ret < 0) { + PERROR("fcntl shmfd FD_CLOEXEC"); + //goto error_fcntl; + } +} ret = fcntl(obj->wait_fd[1], F_SETFD, FD_CLOEXEC); if (ret < 0) { As it shows, wait_fd[1] has been set FD_CLOEXEC by fcntl() but not shm_fd. Why your patch do with wait_fd but not shm_fd? As far as I know, wait_fd is just a pipe and it seems not related to shm resource. -- 发件人:Mathieu Desnoyers 发送时间:2022年3月10日(星期四) 00:46 收件人:zhenyu.ren 抄 送:Jonathan Rajotte ; lttng-dev 主 题:Re: 回复:[lttng-dev] 回复: 回复: 回复: shm leak in traced application? When this happpens, is the process holding a single (or very few) shm file references, or references to many shm files ? I wonder if you end up in a scenario where an application very frequently performs exec(), and therefore sometimes the exec() will happen in the window between the unix socket file descriptor reception and call to fcntl FD_CLOEXEC. Thanks, Mathieu - On Mar 8, 2022, at 8:29 PM, zhenyu.ren wrote: Thanks a lot for reply. I do not reply it in bug tracker since I have not gotten a reliable way to reproduce the leak case. -- 发件人:Mathieu Desnoyers 发送时间:2022年3月8日(星期二) 23:26 收件人:zhenyu.ren 抄 送:Jonathan Rajotte ; lttng-dev 主 题:Re: [lttng-dev] 回复: 回复: 回复: shm leak in traced application? - On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev@lists.lttng.org wrote: > Hi, > In shm_object_table_append_shm()/alloc_shm(), why not calling FD_CLOEXEC > fcntl() > to shmfds? I guess this omission leads to shm fds leak. Those file descriptors are created when received by ustcomm_recv_fds_unix_sock, and immediately after creation they are set as FD_CLOEXEC. We should continue this discussion in the bug tracker as suggested by Jonathan. It would greatly help if you can provide a small reproducer. Thanks, Mathieu > Thanks > zhenyu.ren >> -- >> 发件人:Jonathan Rajotte-Julien >> 发送时间:2022年2月25日(星期五) 22:31 >> 收件人:zhenyu.ren >> 抄 送:lttng-dev >> 主 题:Re: [lttng-dev] 回复: 回复: shm leak in traced application? >> Hi zhenyu.ren, >> Please open a bug on our bug tracker and provide a reproducer against the >> latest >> stable version (2.13.x). >> https://bugs.lttng.org/ >> Please follow the guidelines: >> https://bugs.lttng.org/#Bug-reporting-guidelines >> Cheers >> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote: >> > Hi, lttng-dev team >>> When lttng-sessiond exits, the ust applications should call >>> lttng_ust_objd_table_owner_cleanup() and clean up all shm resource(unmap and >>> close). Howerver I do find that the ust applications keep opening "all" of >>> the >> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free shm. >>> If we run lttng-sessiond again, ust applications can get a new piece of shm >>> and >>> a new list of shm fds so double shm usages. Then if we kill lttng-sessiond, >>> what the mostlikely happened is ust applications close the new list of shm >>> fds >>> and free new shm resource but keeping old shm still. In other word, we can >>> not >> > free this piece of shm unless we killing ust applications!!! >>> So Is there any possilbe that ust applications failed calling >>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this problem? Do >>> you >>> have any advice to free the shm without killling ust applications(I tried to >> > dig into kernel shm_open and /dev/shm, but not found any ideas)? >> > Thanks in advance >> > zhenyu.ren >> > -- >> > 发件人:zhenyu.ren via lttng-dev >> > 发送时间:2022年2月23日(星期三) 23:09 >> > 收件人:
[lttng-dev] 回复: 回复:回复: 回复: 回复: 回复: shm leak in traced application?
Oh, I see. I have an old ust(2.7). So I have no FD_CLOEXEC in ustcomm_recv_fds_unix_sock(). Thanks very much!!! zhenyu.ren -- 发件人:zhenyu.ren via lttng-dev 发送时间:2022年3月10日(星期四) 11:24 收件人:Mathieu Desnoyers 抄 送:lttng-dev 主 题:[lttng-dev] 回复:回复: 回复: 回复: 回复: shm leak in traced application? >When this happpens, is the process holding a single (or very few) shm file >references, or references to many shm files ? It is holding "all" of shm files' reference , neither a single one nor some few ones. In fact, yesterday, I tried to fix it as the following and it seems work. --- a/lttng-ust/libringbuffer/shm.c +++ b/lttng-ust/libringbuffer/shm.c @@ -32,7 +32,6 @@ #include #include #include - /* * Ensure we have the required amount of space available by writing 0 * into the entire buffer. Not doing so can trigger SIGBUS when going @@ -122,6 +121,12 @@ struct shm_object *_shm_object_table_alloc_shm(struct shm_object_table *table, /* create shm */ shmfd = stream_fd; +if (shmfd >= 0) { + ret = fcntl(shmfd, F_SETFD, FD_CLOEXEC); + if (ret < 0) { + PERROR("fcntl shmfd FD_CLOEXEC"); + } +} ret = zero_file(shmfd, memory_map_size); if (ret) { PERROR("zero_file"); @@ -272,15 +277,22 @@ struct shm_object *shm_object_table_append_shm(struct shm_object_table *table, obj->shm_fd = shm_fd; obj->shm_fd_ownership = 1; +if (shm_fd >= 0) { + ret = fcntl(shm_fd, F_SETFD, FD_CLOEXEC); + if (ret < 0) { + PERROR("fcntl shmfd FD_CLOEXEC"); + //goto error_fcntl; + } +} ret = fcntl(obj->wait_fd[1], F_SETFD, FD_CLOEXEC); if (ret < 0) { As it shows, wait_fd[1] has been set FD_CLOEXEC by fcntl() but not shm_fd. Why your patch do with wait_fd but not shm_fd? As far as I know, wait_fd is just a pipe and it seems not related to shm resource. -- 发件人:Mathieu Desnoyers 发送时间:2022年3月10日(星期四) 00:46 收件人:zhenyu.ren 抄 送:Jonathan Rajotte ; lttng-dev 主 题:Re: 回复:[lttng-dev] 回复: 回复: 回复: shm leak in traced application? When this happpens, is the process holding a single (or very few) shm file references, or references to many shm files ? I wonder if you end up in a scenario where an application very frequently performs exec(), and therefore sometimes the exec() will happen in the window between the unix socket file descriptor reception and call to fcntl FD_CLOEXEC. Thanks, Mathieu - On Mar 8, 2022, at 8:29 PM, zhenyu.ren wrote: Thanks a lot for reply. I do not reply it in bug tracker since I have not gotten a reliable way to reproduce the leak case. -- 发件人:Mathieu Desnoyers 发送时间:2022年3月8日(星期二) 23:26 收件人:zhenyu.ren 抄 送:Jonathan Rajotte ; lttng-dev 主 题:Re: [lttng-dev] 回复: 回复: 回复: shm leak in traced application? - On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev@lists.lttng.org wrote: > Hi, > In shm_object_table_append_shm()/alloc_shm(), why not calling FD_CLOEXEC > fcntl() > to shmfds? I guess this omission leads to shm fds leak. Those file descriptors are created when received by ustcomm_recv_fds_unix_sock, and immediately after creation they are set as FD_CLOEXEC. We should continue this discussion in the bug tracker as suggested by Jonathan. It would greatly help if you can provide a small reproducer. Thanks, Mathieu > Thanks > zhenyu.ren >> -- >> 发件人:Jonathan Rajotte-Julien >> 发送时间:2022年2月25日(星期五) 22:31 >> 收件人:zhenyu.ren >> 抄 送:lttng-dev >> 主 题:Re: [lttng-dev] 回复: 回复: shm leak in traced application? >> Hi zhenyu.ren, >> Please open a bug on our bug tracker and provide a reproducer against the >> latest >> stable version (2.13.x). >> https://bugs.lttng.org/ >> Please follow the guidelines: >> https://bugs.lttng.org/#Bug-reporting-guidelines >> Cheers >> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote: >> > Hi, lttng-dev team >>> When lttng-sessiond exits, the ust applications should call >>> lttng_ust_objd_table_owner_cleanup() and clean up all shm resource(unmap and >>> close). Howerver I do find that the ust applications keep opening "all" of >>> the >> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free shm. >>> If we run lttng-sessiond again, ust applications can get a new piece of shm >>> and >>> a new list of shm fds so double shm usages. Then if we kill lttng-sessiond, >>> what the mostlikely happened is ust applications close the new list of shm >>> fds >>> and free new shm resource but keeping old shm still. In other word, we can >>> not >> > free this piece of shm unless we killing ust applications!!! >>> So Is there any possilbe that ust applications failed calling >>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this problem?