[lttng-dev] Crash in LTTng lttng-tools 2.12 snapshot_channel

Codres, Bogdan via lttng-dev Wed, 08 Dec 2021 07:54:14 -0800

Hello all,

My name is Bogdan Codres from Wind River.


Recently, we received a crash from one of our customer. This happened only once
and we do not have a clear path on how to reproduce this.

The crash happened on ARMv7 and the version of lttng-tools was 2.12.
This is the backtrace of the crash:


(gdb) bt
#0 __libc_do_syscall () at libc-do-syscall.S:49
#1 0xb6e13ad4 in __libc_signal_restore_set (set=0xb39f94e0) at 
../sysdeps/unix/sysv/linux/internal-signals.h:84
#2 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3 0xb6e061a6 in __GI_abort () at abort.c:79
#4 0xb6e0ed90 in __assert_fail_base (fmt=0xb6ebfed0 "%s%s%s:%u: %s%sAssertion 
`%s' failed.\n%n", assertion=0x517e10 "!stream->trace_chunk", 
assertion@entry=0xb39fe300 "\001", file=0x51d844 
"../../../../git/src/common/ust-consumer/ust-consumer.c", file@entry=0x0,
line=1124, line@entry=5363780, function=function@entry=0x51d234 
<__PRETTY_FUNCTION__.15949> "snapshot_channel") at assert.c:92
#5 0xb6e0ee0e in __GI___assert_fail (assertion=0xb39fe300 "\001", file=0x0, 
line=5363780, line@entry=1124, function=0x51d234 <__PRETTY_FUNCTION__.15949> 
"snapshot_channel") at assert.c:101
#6 0x004f5840 in snapshot_channel (channel=0xb42008d0, key=1, 
path=path@entry=0xb39f9964 "ust/uid/0/32-bit", 
relayd_id=relayd_id@entry=18446744073709551615, nb_packets_per_stream=0, 
ctx=ctx@entry=0x544048) at 
../../../../git/src/common/ust-consumer/ust-consumer.c:1124
#7 0x004f9a08 in lttng_ustconsumer_recv_cmd (ctx=0x544048, sock=30, 
consumer_sockpoll=<optimized out>) at 
../../../../git/src/common/ust-consumer/ust-consumer.c:1790
#8 0x004dfac0 in consumer_thread_sessiond_poll (data=0x544048) at 
../../../../git/src/common/consumer/consumer.c:3361
#9 0xb6ee7b00 in start_thread (arg=0x98396ec3) at pthread_create.c:486
#10 0xb6e853bc in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from 
/sysroots/armv7at2-neon-wrs-linux-gnueabi/lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


There's an assert( ! stream->trace_chunk ) which fails i.e. the stream 
trace_chunk exists.

There is a comment for the function, saying "the caller must take RCU read side 
lock and channel lock".
The RCU read side lock is taken by the snapshot_channel, but from what I could 
see, nothing seems to take the channel lock in the functions calling the 
snapshot_channel.

As this crash looked like a race condition, and if the comments in the function 
are correct and the channel lock is missing,

it could indeed be a race condition, and therefore I wondered if anyone else 
has seen it.

I did some source code investigation and I saw that in

lttng_kconsumer_recv_cmd which have a similar structure

like the lttng_ustconsumer_recv_cmd ... --> we see 
pthread_mutex_lock(&channel>lock); ---> in LTTNG_CONSUMER_SNAPSHOT_CHANNEL


else {
 pthread_mutex_lock(&channel->lock);
 if (msg.u.snapshot_channel.metadata == 1) {
 ret = lttng_kconsumer_snapshot_metadata(channel, key,
 msg.u.snapshot_channel.pathname,
 msg.u.snapshot_channel.relayd_id, ctx);
 if (ret < 0)
{ ERR("Snapshot metadata failed"); ret_code = 
LTTCOMM_CONSUMERD_SNAPSHOT_FAILED; }
} else {
 ret = lttng_kconsumer_snapshot_channel(channel, key,
 msg.u.snapshot_channel.pathname,
 msg.u.snapshot_channel.relayd_id,
 msg.u.snapshot_channel.nb_packets_per_stream,
 ctx);
 if (ret < 0)
{ ERR("Snapshot channel failed"); ret_code = LTTCOMM_CONSUMERD_SNAPSHOT_FAILED; 
}
}
 pthread_mutex_unlock(&channel->lock);

So, my question is this: shouldn't be used also in lttng_ustconsumer_recv_cmd  
a mutex lock for channel like
it's used in lttng_kconsumer_recv_cmd  ?
What's your opinion on this issue ?



Best Regards,
Ph.D. eng. Bogdan Codres
Senior Engineer at RDC-EMEA, Professional Services, Wind River

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

[lttng-dev] Crash in LTTng lttng-tools 2.12 snapshot_channel

Reply via email to