On Tue, Mar 01, 2022 at 06:19:23PM +0100, Jérémie Galarneau wrote: > Thanks a lot for reporting the problem. If I understand the ASAN > report correctly, the stream itself will also be double free'd, so > I don't think this is the complete fix.
Yeah, it looked odd that consumer_stream_destroy() is called recursively on the same stream but AFAICS the code's been like this for a while so I assumed it was on purpose, and only the metadata bucket stuff was relatively new. ASAN doesn't detect any double frees of the stream itself, but I guess calling call_rcu(..., free_stream_rcu) twice on the same stream is not expected behaviour and could lead to other problems. > There definitely seems to be a problem with regards to the ownership > of the metadata channel vs stream. Let me look into it. Great, thank you! > I see that you fall into a case where the metadata setup fails, > can you share more info about how this can be reproduced? In the core dump I received (on v2.12.4), consumer_stream_destroy() was called from the error label in setup_metadata and ret was set to LTTCOMM_CONSUMERD_ERROR_METADATA. So consumer_send_relayd_stream() had returned an error. I only had the core dump and no other logs, so I did not know which of the paths inside consumer_send_relayd_stream() had failed, but since I was primarily interested in fixing the crash itself I simply forced this code path to be taken: diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index fa1c71299..97ed59632 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -908,8 +908,7 @@ static int setup_metadata(struct lttng_consumer_local_data *ctx, uint64_t key) /* Send metadata stream to relayd if needed. */ if (metadata->metadata_stream->net_seq_idx != (uint64_t) -1ULL) { - ret = consumer_send_relayd_stream(metadata->metadata_stream, - metadata->pathname); + ret = -1; if (ret < 0) { ret = LTTCOMM_CONSUMERD_ERROR_METADATA; goto error; With the above patch, I could easily reproduce the use-after-free using the following steps on the latest release v2.13.4, and it was clear that this use-after-free was the cause of the original core dump on the older release too. Build with ASAN: lttng-tools$ LDFLAGS=-fsanitize=address CFLAGS=-fsanitize=address ./configure Shell #1: lttng-ust$ tests/compile/api0/hello/hello 10000 Shell #2: lttng-tools$ ASAN_OPTIONS=detect_odr_violation=0 ./src/bin/lttng-sessiond/lttng-sessiond Shell #3: lttng-tools$ export ASAN_OPTIONS=detect_odr_violation=0 lttng-tools$ ./src/bin/lttng/lttng create --live && ./src/bin/lttng/lttng enable-event --userspace 1 && ./src/bin/lttng/lttng start && sleep 1 && ./src/bin/lttng/lttng stop The ASAN splat should be seen in shell #2. Note that you may have to run the command in shell #3 a couple of times since LTTNG_CONSUMER_SETUP_METADATA doesn't seem to be sent every time. _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev