2020-10-20 14:35:16 UTC - Ben: I think I might be seeing a potential memory 
leak in the C client, and before I report it I thought I would ask here to see 
if it's a known issue or if I'm misunderstanding something.

I'm actually using the client via a go app. When sending large numbers 
(hundreds of millions) of messages using SendAsync(), I'm seeing the memory 
usage of the app grow apparently without bound. Using the go profiler doesn't 
show any leaked data from go itself, and setting GOGC to a small value doesn't 
help, so I think it must be coming from the C/C++.

Running valgrind with `--leak-check=full` gives me reports like this one:
```==132== 4,004,096 (1,234,208 direct, 2,769,888 indirect) bytes in 77,138 
blocks are definitely lost in loss record 22,211 of 22,211
==132==    at 0x4835DEF: operator new(unsigned long) (vg_replace_malloc.c:334)
==132==    by 0x4AEEB03: handle_producer_send(pulsar::Result, 
pulsar::MessageId, void (*)(pulsar_result, _pulsar_message_id*, void*), void*) 
(<http://c_Producer.cc:42|c_Producer.cc:42>)
...```
Looking at `handle_producer_send()` in `c_<http://Producer.cc|Producer.cc>`, I 
see this line:
```        pulsar_message_id_t *c_message_id = new pulsar_message_id_t;```
Where does that message ID get deleted? If not, is this a memory leak?
----
2020-10-20 14:35:53 UTC - Ben: I'm running 2.6.1 in production, but looking at 
the latest master branch above.
----
2020-10-20 15:51:07 UTC - Yunze Xu: Yeah, it's a bug. You can delete the 
`pulsar_message_id_t` in your callback to `handle_producer_send`. However, for 
C language, there's no `delete`. The same goes for CGO client. IMO, it should 
be deleted in `handle_producer_send`.
----
2020-10-20 15:51:36 UTC - Ben: Thanks -- I agree and I'm testing a patch there. 
If it works I'll send a PR.
----
2020-10-20 16:02:01 UTC - Ben: Adding the corresponding delete in 
`handle_producer_send()` seems to completely fix my issue: valgrind no longer 
complains, and my binary has stopped exploding.
----
2020-10-20 16:10:53 UTC - Ben: OK, PR 
<https://github.com/apache/pulsar/pull/8312> created. Thanks for confirming 
@Yunze Xu.
+1 : Yunze Xu
----
2020-10-20 16:21:38 UTC - Yunze Xu: I think it's better to just define a stack 
variable `pulsar_message_id_t c_message_id;` and pass `&amp;c_message_id` here, 
the `new`/`delete` is not necessary.
----
2020-10-20 16:29:43 UTC - Matteo Merli: The problem is that there are apps 
relying on this (bad) behavior
----
2020-10-20 16:29:55 UTC - Matteo Merli: any change, will break compatibility
----
2020-10-20 16:30:45 UTC - Ben: OK. But that means that at the moment, your 
golang client is basically totally broken.
----
2020-10-20 16:31:09 UTC - Matteo Merli: The client should call 
message_id_free() in callback
----
2020-10-20 16:32:36 UTC - Ben: How do I do that from golang?
----
2020-10-20 16:33:04 UTC - Matteo Merli: that should be done in  the CGo code. 
let me check that code path
----
2020-10-20 16:36:21 UTC - Ben: I don't see the free (and I didn't see it in 
valgrind). I think it should be in file c_producer.go, func 
pulsarProducerSendCallbackProxy() ?
----
2020-10-20 16:40:58 UTC - Ben: Looks like adding 
`C.pulsar_message_id_free(messageId)` to the second else clause in that 
function might be the correct fix.
----
2020-10-20 16:41:51 UTC - Yunze Xu: Seems like that you need to call 
`messageIdFinalizer` to free the message id
----
2020-10-20 16:43:24 UTC - Ben: I think the finalizer is intended to be used by 
the CGo runtime during garbage collection. In this case though, the runtime 
can't know that the pointer needs to be freed (because it was created outside 
go). So we will need to free it manually after the callback runs.
----
2020-10-20 16:43:59 UTC - Ben: I've spent quite a while looking through the CGo 
code, I think I must be the only person using it...!!!
----
2020-10-20 16:47:56 UTC - Matteo Merli: :slightly_smiling_face:

At the moment, the native Go client is preferred rather than the CGo one
----
2020-10-20 16:48:30 UTC - Ben: Heh, that makes sense. Is it ready for 
production yet? If so I might consider changing over.
----
2020-10-20 16:59:46 UTC - Matteo Merli: Yes, we've been using that in prod from 
9 months already
+1 : Ben
----
2020-10-20 17:00:41 UTC - Ben: I'll schedule in some work to switch over. In 
the meantime, I'm going to fix the cgo interface and send a PR -- even if it 
doesn't get applied it might help someone.
----

Reply via email to