Re: [lttng-dev] 回复: 回复: 回复: 回复: shm leak in traced application?

2022-03-09 Thread Mathieu Desnoyers via lttng-dev
When this happpens, is the process holding a single (or very few) shm file 
references, or references to many 
shm files ? 

I wonder if you end up in a scenario where an application very frequently 
performs exec(), and therefore 
sometimes the exec() will happen in the window between the unix socket file 
descriptor reception and 
call to fcntl FD_CLOEXEC. 

Thanks, 

Mathieu 

- On Mar 8, 2022, at 8:29 PM, zhenyu.ren  wrote: 

> Thanks a lot for reply. I do not reply it in bug tracker since I have not 
> gotten
> a reliable way to reproduce the leak case.

>> --
>> 发件人:Mathieu Desnoyers 
>> 发送时间:2022年3月8日(星期二) 23:26
>> 收件人:zhenyu.ren 
>> 抄 送:Jonathan Rajotte ; lttng-dev
>> 
>> 主 题:Re: [lttng-dev] 回复: 回复: 回复: shm leak in traced application?

>> - On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

>> > Hi,
>> > In shm_object_table_append_shm()/alloc_shm(), why not calling FD_CLOEXEC 
>> > fcntl()
>> > to shmfds? I guess this omission leads to shm fds leak.

>> Those file descriptors are created when received by 
>> ustcomm_recv_fds_unix_sock,
>> and
>> immediately after creation they are set as FD_CLOEXEC.

>> We should continue this discussion in the bug tracker as suggested by 
>> Jonathan.
>> It would greatly help if you can provide a small reproducer.

>> Thanks,

>> Mathieu

>> > Thanks
>> > zhenyu.ren

>> >> --
>> >> 发件人:Jonathan Rajotte-Julien 
>> >> 发送时间:2022年2月25日(星期五) 22:31
>> >> 收件人:zhenyu.ren 
>> >> 抄 送:lttng-dev 
>> >> 主 题:Re: [lttng-dev] 回复: 回复: shm leak in traced application?

>> >> Hi zhenyu.ren,

>> >> Please open a bug on our bug tracker and provide a reproducer against the 
>> >> latest
>> >> stable version (2.13.x).

>> >> https://bugs.lttng.org/

>> >> Please follow the guidelines: 
>> >> https://bugs.lttng.org/#Bug-reporting-guidelines

>> >> Cheers

>> >> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote:
>> >> > Hi, lttng-dev team
>> >>> When lttng-sessiond exits, the ust applications should call
>> >>> lttng_ust_objd_table_owner_cleanup() and clean up all shm resource(unmap 
>> >>> and
>> >>> close). Howerver I do find that the ust applications keep opening "all" 
>> >>> of the
>> >> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free 
>> >> > shm.
>> >>> If we run lttng-sessiond again, ust applications can get a new piece of 
>> >>> shm and
>> >>> a new list of shm fds so double shm usages. Then if we kill 
>> >>> lttng-sessiond,
>> >>> what the mostlikely happened is ust applications close the new list of 
>> >>> shm fds
>> >>> and free new shm resource but keeping old shm still. In other word, we 
>> >>> can not
>> >> > free this piece of shm unless we killing ust applications!!!
>> >>> So Is there any possilbe that ust applications failed calling
>> >>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this problem? 
>> >>> Do you
>> >>> have any advice to free the shm without killling ust applications(I 
>> >>> tried to
>> >> > dig into kernel shm_open and /dev/shm, but not found any ideas)?

>> >> > Thanks in advance
>> >> > zhenyu.ren

>> >> > --
>> >> > 发件人:zhenyu.ren via lttng-dev 
>> >> > 发送时间:2022年2月23日(星期三) 23:09
>> >> > 收件人:lttng-dev 
>> >> > 主 题:[lttng-dev] 回复: shm leak in traced application?

>> >>> >"I found these items also exist in a traced application which is a 
>> >>> >long-time
>> >> > >running daemon"
>> >> > Even if lttng-sessiond has been killed!!

>> >> > Thanks
>> >> > zhenyu.ren
>> >> > --
>> >> > 发件人:zhenyu.ren via lttng-dev 
>> >> > 发送时间:2022年2月23日(星期三) 22:44
>> >> > 收件人:lttng-dev 
>> >> > 主 题:[lttng-dev] shm leak in traced application?

>> >> > Hi,
>> >>> There are many items such as "/dev/shm/ust-shm-consumer-81132 (deleted)" 
>> >>> exist
>> >>> in lttng-sessiond fd spaces. I know it is the result of shm_open() and
>> >> > shm_unlnik() in create_posix_shm().
>> >>> However, today, I found these items also exist in a traced application 
>> >>> which is
>> >>> a long-time running daemon. The most important thing I found is that 
>> >>> there
>> >> > seems no reliable way to release share memory.
>> >>> I tried to kill lttng-sessiond but not always release share memory. 
>> >>> Sometimes I
>> >>> need to kill the traced application to free share memoryBut it is 
>> >>> not a
>> >> > good idea to kill these applications.
>> >> > My questions are:
>> >>> 1. Is there any way to release share memory without killing any traced
>> >> > application?
>> >>> 2. Is it normal that many items such as "/dev/shm/ust-shm-consumer-81132
>> >> > (deleted)" exist in the traced application?

>> >> > Thanks
>> >> > zhenyu.ren

>> >> > ___
>> >> > lttng-dev mailing list
>> >> > lttng-dev@lists.

Re: [lttng-dev] 回复: 回复: 回复: 回复: shm leak in traced application?

2022-03-09 Thread Mathieu Desnoyers via lttng-dev
Hi Zhenyu, 

Can you try this fix please ? 

https://review.lttng.org/c/lttng-ust/+/7530 

And let me know how it goes. 

Thanks, 

Mathieu 

- On Mar 9, 2022, at 11:37 AM, Mathieu Desnoyers 
 wrote: 

> When this happpens, is the process holding a single (or very few) shm file
> references, or references to many
> shm files ?

> I wonder if you end up in a scenario where an application very frequently
> performs exec(), and therefore
> sometimes the exec() will happen in the window between the unix socket file
> descriptor reception and
> call to fcntl FD_CLOEXEC.

> Thanks,

> Mathieu

> - On Mar 8, 2022, at 8:29 PM, zhenyu.ren  wrote:

>> Thanks a lot for reply. I do not reply it in bug tracker since I have not 
>> gotten
>> a reliable way to reproduce the leak case.

>>> --
>>> 发件人:Mathieu Desnoyers 
>>> 发送时间:2022年3月8日(星期二) 23:26
>>> 收件人:zhenyu.ren 
>>> 抄 送:Jonathan Rajotte ; lttng-dev
>>> 
>>> 主 题:Re: [lttng-dev] 回复: 回复: 回复: shm leak in traced application?

>>> - On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev@lists.lttng.org 
>>> wrote:

>>> > Hi,
>>> > In shm_object_table_append_shm()/alloc_shm(), why not calling FD_CLOEXEC 
>>> > fcntl()
>>> > to shmfds? I guess this omission leads to shm fds leak.

>>> Those file descriptors are created when received by 
>>> ustcomm_recv_fds_unix_sock,
>>> and
>>> immediately after creation they are set as FD_CLOEXEC.

>>> We should continue this discussion in the bug tracker as suggested by 
>>> Jonathan.
>>> It would greatly help if you can provide a small reproducer.

>>> Thanks,

>>> Mathieu

>>> > Thanks
>>> > zhenyu.ren

>>> >> --
>>> >> 发件人:Jonathan Rajotte-Julien 
>>> >> 发送时间:2022年2月25日(星期五) 22:31
>>> >> 收件人:zhenyu.ren 
>>> >> 抄 送:lttng-dev 
>>> >> 主 题:Re: [lttng-dev] 回复: 回复: shm leak in traced application?

>>> >> Hi zhenyu.ren,

>>> >> Please open a bug on our bug tracker and provide a reproducer against 
>>> >> the latest
>>> >> stable version (2.13.x).

>>> >> https://bugs.lttng.org/

>>> >> Please follow the guidelines: 
>>> >> https://bugs.lttng.org/#Bug-reporting-guidelines

>>> >> Cheers

>>> >> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote:
>>> >> > Hi, lttng-dev team
>>> >>> When lttng-sessiond exits, the ust applications should call
>>> >>> lttng_ust_objd_table_owner_cleanup() and clean up all shm 
>>> >>> resource(unmap and
>>> >>> close). Howerver I do find that the ust applications keep opening "all" 
>>> >>> of the
>>> >> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free 
>>> >> > shm.
>>> >>> If we run lttng-sessiond again, ust applications can get a new piece of 
>>> >>> shm and
>>> >>> a new list of shm fds so double shm usages. Then if we kill 
>>> >>> lttng-sessiond,
>>> >>> what the mostlikely happened is ust applications close the new list of 
>>> >>> shm fds
>>> >>> and free new shm resource but keeping old shm still. In other word, we 
>>> >>> can not
>>> >> > free this piece of shm unless we killing ust applications!!!
>>> >>> So Is there any possilbe that ust applications failed calling
>>> >>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this 
>>> >>> problem? Do you
>>> >>> have any advice to free the shm without killling ust applications(I 
>>> >>> tried to
>>> >> > dig into kernel shm_open and /dev/shm, but not found any ideas)?

>>> >> > Thanks in advance
>>> >> > zhenyu.ren

>>> >> > --
>>> >> > 发件人:zhenyu.ren via lttng-dev 
>>> >> > 发送时间:2022年2月23日(星期三) 23:09
>>> >> > 收件人:lttng-dev 
>>> >> > 主 题:[lttng-dev] 回复: shm leak in traced application?

>>> >>> >"I found these items also exist in a traced application which is a 
>>> >>> >long-time
>>> >> > >running daemon"
>>> >> > Even if lttng-sessiond has been killed!!

>>> >> > Thanks
>>> >> > zhenyu.ren
>>> >> > --
>>> >> > 发件人:zhenyu.ren via lttng-dev 
>>> >> > 发送时间:2022年2月23日(星期三) 22:44
>>> >> > 收件人:lttng-dev 
>>> >> > 主 题:[lttng-dev] shm leak in traced application?

>>> >> > Hi,
>>> >>> There are many items such as "/dev/shm/ust-shm-consumer-81132 
>>> >>> (deleted)" exist
>>> >>> in lttng-sessiond fd spaces. I know it is the result of shm_open() and
>>> >> > shm_unlnik() in create_posix_shm().
>>> >>> However, today, I found these items also exist in a traced application 
>>> >>> which is
>>> >>> a long-time running daemon. The most important thing I found is that 
>>> >>> there
>>> >> > seems no reliable way to release share memory.
>>> >>> I tried to kill lttng-sessiond but not always release share memory. 
>>> >>> Sometimes I
>>> >>> need to kill the traced application to free share memoryBut it is 
>>> >>> not a
>>> >> > good idea to kill these applications.
>>> >> > My questions are:
>>> >>> 1. Is there any way to release share memory withou

[lttng-dev] 回复:回复: 回复: 回复: 回复: shm leak in traced application?

2022-03-09 Thread zhenyu.ren via lttng-dev
>When this happpens, is the process holding a single (or very few) shm file 
>references, or references to many shm files ?

It is holding "all" of shm files' reference , neither a single one nor some few 
ones.

In fact, yesterday, I tried to fix it as the following and it seems work.

--- a/lttng-ust/libringbuffer/shm.c
+++ b/lttng-ust/libringbuffer/shm.c
@@ -32,7 +32,6 @@
 #include 
 #include 
 #include 
-
 /*
  * Ensure we have the required amount of space available by writing 0
  * into the entire buffer. Not doing so can trigger SIGBUS when going
@@ -122,6 +121,12 @@ struct shm_object *_shm_object_table_alloc_shm(struct 
shm_object_table *table,
/* create shm */

shmfd = stream_fd;
+if (shmfd >= 0) {
+ ret = fcntl(shmfd, F_SETFD, FD_CLOEXEC);
+ if (ret < 0) {
+   PERROR("fcntl shmfd FD_CLOEXEC");
+ }
+}
ret = zero_file(shmfd, memory_map_size);
if (ret) {
PERROR("zero_file");
@@ -272,15 +277,22 @@ struct shm_object *shm_object_table_append_shm(struct 
shm_object_table *table,
obj->shm_fd = shm_fd;
obj->shm_fd_ownership = 1;

+if (shm_fd >= 0) {
+ ret = fcntl(shm_fd, F_SETFD, FD_CLOEXEC);
+ if (ret < 0) {
+   PERROR("fcntl shmfd FD_CLOEXEC");
+   //goto error_fcntl;
+ }
+}
ret = fcntl(obj->wait_fd[1], F_SETFD, FD_CLOEXEC);
if (ret < 0) {

As it shows, wait_fd[1] has been set FD_CLOEXEC by fcntl() but not shm_fd. 
Why your patch do with wait_fd but not shm_fd? As far as I know, wait_fd is 
just a pipe and it seems not related to shm resource.







--
发件人:Mathieu Desnoyers 
发送时间:2022年3月10日(星期四) 00:46
收件人:zhenyu.ren 
抄 送:Jonathan Rajotte ; lttng-dev 

主 题:Re: 回复:[lttng-dev] 回复: 回复: 回复: shm leak in traced application?

When this happpens, is the process holding a single (or very few) shm file 
references, or references to many
shm files ?

I wonder if you end up in a scenario where an application very frequently 
performs exec(), and therefore
sometimes the exec() will happen in the window between the unix socket file 
descriptor reception and
call to fcntl FD_CLOEXEC.

Thanks,

Mathieu

- On Mar 8, 2022, at 8:29 PM, zhenyu.ren  wrote:

Thanks a  lot for reply. I do not reply it in bug tracker since I have not 
gotten a reliable way to reproduce the leak case. 
--
发件人:Mathieu Desnoyers 
发送时间:2022年3月8日(星期二) 23:26
收件人:zhenyu.ren 
抄 送:Jonathan Rajotte ; lttng-dev 

主 题:Re: [lttng-dev] 回复: 回复: 回复: shm leak in traced application?



- On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Hi,
> In shm_object_table_append_shm()/alloc_shm(), why not calling FD_CLOEXEC 
> fcntl()
> to shmfds? I guess this omission leads to shm fds leak.

Those file descriptors are created when received by ustcomm_recv_fds_unix_sock, 
and
immediately after creation they are set as FD_CLOEXEC.

We should continue this discussion in the bug tracker as suggested by Jonathan.
It would greatly help if you can provide a small reproducer.

Thanks,

Mathieu


> Thanks
> zhenyu.ren

>> --
>> 发件人:Jonathan Rajotte-Julien 
>> 发送时间:2022年2月25日(星期五) 22:31
>> 收件人:zhenyu.ren 
>> 抄 送:lttng-dev 
>> 主 题:Re: [lttng-dev] 回复: 回复: shm leak in traced application?

>> Hi zhenyu.ren,

>> Please open a bug on our bug tracker and provide a reproducer against the 
>> latest
>> stable version (2.13.x).

>> https://bugs.lttng.org/

>> Please follow the guidelines: 
>> https://bugs.lttng.org/#Bug-reporting-guidelines

>> Cheers

>> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote:
>> > Hi, lttng-dev team
>>> When lttng-sessiond exits, the ust applications should call
>>> lttng_ust_objd_table_owner_cleanup() and clean up all shm resource(unmap and
>>> close). Howerver I do find that the ust applications keep opening "all" of 
>>> the
>> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free shm.
>>> If we run lttng-sessiond again, ust applications can get a new piece of shm 
>>> and
>>> a new list of shm fds so double shm usages. Then if we kill lttng-sessiond,
>>> what the mostlikely happened is ust applications close the new list of shm 
>>> fds
>>> and free new shm resource but keeping old shm still. In other word, we can 
>>> not
>> > free this piece of shm unless we killing ust applications!!!
>>> So Is there any possilbe that ust applications failed calling
>>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this problem? Do 
>>> you
>>> have any advice to free the shm without killling ust applications(I tried to
>> > dig into kernel shm_open and /dev/shm, but not found any ideas)?

>> > Thanks in advance
>> > zhenyu.ren



>> > --
>> > 发件人:zhenyu.ren via lttng-dev 
>> > 发送时间:2022年2月23日(星期三) 23:09
>> > 收件人:

[lttng-dev] 回复: 回复:回复: 回复: 回复: 回复: shm leak in traced application?

2022-03-09 Thread zhenyu.ren via lttng-dev
Oh, I see. I have an old ust(2.7). So I have no FD_CLOEXEC in 
ustcomm_recv_fds_unix_sock(). 

Thanks very much!!!
zhenyu.ren
--
发件人:zhenyu.ren via lttng-dev 
发送时间:2022年3月10日(星期四) 11:24
收件人:Mathieu Desnoyers 
抄 送:lttng-dev 
主 题:[lttng-dev] 回复:回复: 回复: 回复: 回复: shm leak in traced application?

>When this happpens, is the process holding a single (or very few) shm file 
>references, or references to many shm files ?

It is holding "all" of shm files' reference , neither a single one nor some few 
ones.

In fact, yesterday, I tried to fix it as the following and it seems work.

--- a/lttng-ust/libringbuffer/shm.c
+++ b/lttng-ust/libringbuffer/shm.c
@@ -32,7 +32,6 @@
 #include 
 #include 
 #include 
-
 /*
  * Ensure we have the required amount of space available by writing 0
  * into the entire buffer. Not doing so can trigger SIGBUS when going
@@ -122,6 +121,12 @@ struct shm_object *_shm_object_table_alloc_shm(struct 
shm_object_table *table,
/* create shm */

shmfd = stream_fd;
+if (shmfd >= 0) {
+ ret = fcntl(shmfd, F_SETFD, FD_CLOEXEC);
+ if (ret < 0) {
+   PERROR("fcntl shmfd FD_CLOEXEC");
+ }
+}
ret = zero_file(shmfd, memory_map_size);
if (ret) {
PERROR("zero_file");
@@ -272,15 +277,22 @@ struct shm_object *shm_object_table_append_shm(struct 
shm_object_table *table,
obj->shm_fd = shm_fd;
obj->shm_fd_ownership = 1;

+if (shm_fd >= 0) {
+ ret = fcntl(shm_fd, F_SETFD, FD_CLOEXEC);
+ if (ret < 0) {
+   PERROR("fcntl shmfd FD_CLOEXEC");
+   //goto error_fcntl;
+ }
+}
ret = fcntl(obj->wait_fd[1], F_SETFD, FD_CLOEXEC);
if (ret < 0) {

As it shows, wait_fd[1] has been set FD_CLOEXEC by fcntl() but not shm_fd. 
Why your patch do with wait_fd but not shm_fd? As far as I know, wait_fd is 
just a pipe and it seems not related to shm resource.







--
发件人:Mathieu Desnoyers 
发送时间:2022年3月10日(星期四) 00:46
收件人:zhenyu.ren 
抄 送:Jonathan Rajotte ; lttng-dev 

主 题:Re: 回复:[lttng-dev] 回复: 回复: 回复: shm leak in traced application?

When this happpens, is the process holding a single (or very few) shm file 
references, or references to many
shm files ?

I wonder if you end up in a scenario where an application very frequently 
performs exec(), and therefore
sometimes the exec() will happen in the window between the unix socket file 
descriptor reception and
call to fcntl FD_CLOEXEC.

Thanks,

Mathieu

- On Mar 8, 2022, at 8:29 PM, zhenyu.ren  wrote:
Thanks a  lot for reply. I do not reply it in bug tracker since I have not 
gotten a reliable way to reproduce the leak case. 
--
发件人:Mathieu Desnoyers 
发送时间:2022年3月8日(星期二) 23:26
收件人:zhenyu.ren 
抄 送:Jonathan Rajotte ; lttng-dev 

主 题:Re: [lttng-dev] 回复: 回复: 回复: shm leak in traced application?



- On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev@lists.lttng.org wrote:

> Hi,
> In shm_object_table_append_shm()/alloc_shm(), why not calling FD_CLOEXEC 
> fcntl()
> to shmfds? I guess this omission leads to shm fds leak.

Those file descriptors are created when received by ustcomm_recv_fds_unix_sock, 
and
immediately after creation they are set as FD_CLOEXEC.

We should continue this discussion in the bug tracker as suggested by Jonathan.
It would greatly help if you can provide a small reproducer.

Thanks,

Mathieu


> Thanks
> zhenyu.ren

>> --
>> 发件人:Jonathan Rajotte-Julien 
>> 发送时间:2022年2月25日(星期五) 22:31
>> 收件人:zhenyu.ren 
>> 抄 送:lttng-dev 
>> 主 题:Re: [lttng-dev] 回复: 回复: shm leak in traced application?

>> Hi zhenyu.ren,

>> Please open a bug on our bug tracker and provide a reproducer against the 
>> latest
>> stable version (2.13.x).

>> https://bugs.lttng.org/

>> Please follow the guidelines: 
>> https://bugs.lttng.org/#Bug-reporting-guidelines

>> Cheers

>> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote:
>> > Hi, lttng-dev team
>>> When lttng-sessiond exits, the ust applications should call
>>> lttng_ust_objd_table_owner_cleanup() and clean up all shm resource(unmap and
>>> close). Howerver I do find that the ust applications keep opening "all" of 
>>> the
>> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free shm.
>>> If we run lttng-sessiond again, ust applications can get a new piece of shm 
>>> and
>>> a new list of shm fds so double shm usages. Then if we kill lttng-sessiond,
>>> what the mostlikely happened is ust applications close the new list of shm 
>>> fds
>>> and free new shm resource but keeping old shm still. In other word, we can 
>>> not
>> > free this piece of shm unless we killing ust applications!!!
>>> So Is there any possilbe that ust applications failed calling
>>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this problem?