[lttng-dev] [PATCH lttng-tools 1/3] Fix: relayd: tracefile rotation: viewer opening missing index file
Moving the head position of the tracefile array when the data is received opens a window where a viewer attaching to the session could try to open a missing index file (which has not been received yet). However, we want to bump the tail position as soon as we receive data, because the prior tail is not valid anymore. Solve this by introducing two head positions: the "read" head and the "write" head. The "write" head is the position of the newest data file (equivalent to the prior "head" position). We also introduce a "read" head position, which is only moved forward when the index is received. The viewer now uses the "read" head position as upper bound, which ensures it never attempts to open a non-existing index file. Signed-off-by: Mathieu Desnoyers --- src/bin/lttng-relayd/stream.c | 4 +- src/bin/lttng-relayd/tracefile-array.c | 58 -- src/bin/lttng-relayd/tracefile-array.h | 21 -- src/bin/lttng-relayd/viewer-stream.c | 2 +- 4 files changed, 58 insertions(+), 27 deletions(-) diff --git a/src/bin/lttng-relayd/stream.c b/src/bin/lttng-relayd/stream.c index 94698f8d..4d3d37a2 100644 --- a/src/bin/lttng-relayd/stream.c +++ b/src/bin/lttng-relayd/stream.c @@ -958,7 +958,7 @@ int stream_init_packet(struct relay_stream *stream, size_t packet_size, stream->stream_handle, stream->tracefile_size_current, packet_size, stream->tracefile_current_index, new_file_index); - tracefile_array_file_rotate(stream->tfa); + tracefile_array_file_rotate(stream->tfa, TRACEFILE_ROTATE_WRITE); stream->tracefile_current_index = new_file_index; if (stream->stream_fd) { @@ -1095,6 +1095,7 @@ int stream_update_index(struct relay_stream *stream, uint64_t net_seq_num, ret = relay_index_try_flush(index); if (ret == 0) { + tracefile_array_file_rotate(stream->tfa, TRACEFILE_ROTATE_READ); tracefile_array_commit_seq(stream->tfa); stream->index_received_seqcount++; *flushed = true; @@ -1188,6 +1189,7 @@ int stream_add_index(struct relay_stream *stream, } ret = relay_index_try_flush(index); if (ret == 0) { + tracefile_array_file_rotate(stream->tfa, TRACEFILE_ROTATE_READ); tracefile_array_commit_seq(stream->tfa); stream->index_received_seqcount++; stream->pos_after_last_complete_data_index += index->total_size; diff --git a/src/bin/lttng-relayd/tracefile-array.c b/src/bin/lttng-relayd/tracefile-array.c index 20b760c0..3d62317a 100644 --- a/src/bin/lttng-relayd/tracefile-array.c +++ b/src/bin/lttng-relayd/tracefile-array.c @@ -62,7 +62,8 @@ void tracefile_array_destroy(struct tracefile_array *tfa) free(tfa); } -void tracefile_array_file_rotate(struct tracefile_array *tfa) +void tracefile_array_file_rotate(struct tracefile_array *tfa, + enum tracefile_rotate_type type) { uint64_t *headp, *tailp; @@ -70,24 +71,37 @@ void tracefile_array_file_rotate(struct tracefile_array *tfa) /* Not in tracefile rotation mode. */ return; } - /* Rotate to next file. */ - tfa->file_head = (tfa->file_head + 1) % tfa->count; - if (tfa->file_head == tfa->file_tail) { - /* Move tail. */ - tfa->file_tail = (tfa->file_tail + 1) % tfa->count; - } - headp = &tfa->tf[tfa->file_head].seq_head; - tailp = &tfa->tf[tfa->file_head].seq_tail; - /* -* If we overwrite a file with content, we need to push the tail -* to the position following the content we are overwriting. -*/ - if (*headp != -1ULL) { - tfa->seq_tail = tfa->tf[tfa->file_tail].seq_tail; + switch (type) { + case TRACEFILE_ROTATE_READ: + /* +* Rotate read head to write head position, thus allowing +* reader to consume the newly rotated head file. +*/ + tfa->file_head_read = tfa->file_head_write; + break; + case TRACEFILE_ROTATE_WRITE: + /* Rotate write head to next file, pushing tail if needed. */ + tfa->file_head_write = (tfa->file_head_write + 1) % tfa->count; + if (tfa->file_head_write == tfa->file_tail) { + /* Move tail. */ + tfa->file_tail = (tfa->file_tail + 1) % tfa->count; + } + headp = &tfa->tf[tfa->file_head_write].seq_head; + tailp = &tfa->tf[tfa->file_head_write].seq_tail; + /* +* If we overwrite a file with content, we need to push the tail +* to the position following the content we are overwriting. +*/ + if (*headp != -1ULL) { +
[lttng-dev] [PATCH lttng-tools 2/3] Fix: relayd: put chunk reference when closing stream
If a stream is closed by an application exiting (per-pid buffers), it needs to put its reference on the stream trace chunk right away, because otherwise still holding the reference on the trace chunk could allow a viewer stream (which holds a reference to the stream) to postpone destroy waiting for the chunk to cease to exist endlessly until the viewer is detached. Signed-off-by: Mathieu Desnoyers --- src/bin/lttng-relayd/stream.c | 21 + 1 file changed, 21 insertions(+) diff --git a/src/bin/lttng-relayd/stream.c b/src/bin/lttng-relayd/stream.c index 4d3d37a2..9d753bd0 100644 --- a/src/bin/lttng-relayd/stream.c +++ b/src/bin/lttng-relayd/stream.c @@ -923,6 +923,27 @@ void try_stream_close(struct relay_stream *stream) stream->closed = true; /* Relay indexes are only used by the "consumer/sessiond" end. */ relay_index_close_all(stream); + + /* +* If we are closed by an application exiting (per-pid buffers), +* we need to put our reference on the stream trace chunk right +* away, because otherwise still holding the reference on the +* trace chunk could allow a viewer stream (which holds a reference +* to the stream) to postpone destroy waiting for the chunk to cease +* to exist endlessly until the viewer is detached. +*/ + + /* Put stream fd before put chunk. */ + if (stream->stream_fd) { + stream_fd_put(stream->stream_fd); + stream->stream_fd = NULL; + } + if (stream->index_file) { + lttng_index_file_put(stream->index_file); + stream->index_file = NULL; + } + lttng_trace_chunk_put(stream->trace_chunk); + stream->trace_chunk = NULL; pthread_mutex_unlock(&stream->lock); DBG("Succeeded in closing stream %" PRIu64, stream->stream_handle); stream_put(stream); -- 2.17.1 ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [PATCH lttng-tools 3/3] Fix: sessiond: ust: deadlock with per-pid buffers
Do not hold the registry lock while communicating with the consumerd, because doing so causes inter-process deadlocks between consumerd and sessiond with the metadata request notification. The deadlock involves both sessiond and consumerd: * lttng-sessiond: thread 11 - thread_application_management close_metadata() pthread_mutex_lock(®istry->lock); consumer_close_metadata() pthread_mutex_lock(socket->lock); thread 15 - thread_consumer_management ust_consumer_metadata_request() pthread_mutex_lock(&ust_reg->lock); thread 8 - thread_manage_clients consumer_is_data_pending pthread_mutex_lock(socket->lock); consumer_socket_recv() * lttng-consumerd: thread 4 - consumer_timer_thread sample_channel_positions() pthread_mutex_lock(&stream->lock); thread 8 - consumer_thread_sessiond_poll consumer_data_pending pthread_mutex_lock(&consumer_data.lock); pthread_mutex_lock(&stream->lock); thread 7 - consumer_thread_data_poll lttng_consumer_read_subbuffer pthread_mutex_lock(&stream->chan->lock); pthread_mutex_lock(&stream->lock); do_sync_metadata pthread_mutex_lock(&metadata->lock); lttng_ustconsumer_sync_metadata pthread_mutex_unlock(&metadata_stream->lock); lttng_ustconsumer_request_metadata() pthread_mutex_lock(&ctx->metadata_socket_lock); lttcomm_recv_unix_sock() Signed-off-by: Mathieu Desnoyers --- src/bin/lttng-sessiond/ust-app.c | 31 ++- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/src/bin/lttng-sessiond/ust-app.c b/src/bin/lttng-sessiond/ust-app.c index 1731c368..1ecff438 100644 --- a/src/bin/lttng-sessiond/ust-app.c +++ b/src/bin/lttng-sessiond/ust-app.c @@ -741,6 +741,10 @@ error: * nullified. The session lock MUST be held unless the application is * in the destroy path. * + * Do not hold the registry lock while communicating with the consumerd, because + * doing so causes inter-process deadlocks between consumerd and sessiond with + * the metadata request notification. + * * Return 0 on success else a negative value. */ static int close_metadata(struct ust_registry_session *registry, @@ -748,6 +752,7 @@ static int close_metadata(struct ust_registry_session *registry, { int ret; struct consumer_socket *socket; + uint64_t metadata_key; assert(registry); assert(consumer); @@ -755,34 +760,34 @@ static int close_metadata(struct ust_registry_session *registry, rcu_read_lock(); pthread_mutex_lock(®istry->lock); - - if (!registry->metadata_key || registry->metadata_closed) { + metadata_key = registry->metadata_key; + if (!metadata_key || registry->metadata_closed) { ret = 0; + pthread_mutex_unlock(®istry->lock); goto end; } + /* +* Metadata closed. Even on error this means that the consumer is not +* responding or not found so either way a second close should NOT be emit +* for this registry. +*/ + registry->metadata_closed = 1; + pthread_mutex_unlock(®istry->lock); /* Get consumer socket to use to push the metadata.*/ socket = consumer_find_socket_by_bitness(registry->bits_per_long, consumer); if (!socket) { ret = -1; - goto error; + goto end; } - ret = consumer_close_metadata(socket, registry->metadata_key); + ret = consumer_close_metadata(socket, metadata_key); if (ret < 0) { - goto error; + goto end; } -error: - /* -* Metadata closed. Even on error this means that the consumer is not -* responding or not found so either way a second close should NOT be emit -* for this registry. -*/ - registry->metadata_closed = 1; end: - pthread_mutex_unlock(®istry->lock); rcu_read_unlock(); return ret; } -- 2.17.1 ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] Babeltrace 2.0.0-rc2
Hi everyone! Two weeks after having released [1] Babeltrace 2's first release candidate, today we're releasing Babeltrace 2.0.0-rc2. A prettified version of this announcement is available at: https://diamon.org/babeltrace/docs/release-notes/babeltrace-2.0.0-rc2-release-notes.html What's new since Babeltrace 2.0.0-rc1? == Improvements General: * Introduce `BT_ASSERT_DBG()` for assertions that must only be evaluated in debugging mode (when the `BABELTRACE_DEBUG_MODE` environment variable is set to `1` at configuration time). `BT_ASSERT()` assertions are now always evaluated, regardless of the debugging/production mode. This change's purpose is to keep as many slow-path assertions as possible to detect more Babeltrace 2 programming errors, even in production. See `98b15851`. * Remove ABI versioning in Babeltrace 2 shared object plugins as we don't need this property. See `7c0244d6`. * In libbabeltrace2's API: * Rename `BT_RANGE_SET_` prefixes to `BT_INTEGER_RANGE_SET_`. See `eb171ee9`. * Make `bt_version_get_major()`, `bt_version_get_minor()`, and `bt_version_get_patch()` functions return `unsigned int` instead of `int`. See `78deb913`. * Remove the `BT_GRAPH_RUN_STATUS_END` status code: `bt_graph_run()` now returns `BT_GRAPH_RUN_STATUS_OK` when all the graph's sink components are ended. See `9669d693`. `ctf` plugin: Append error causes at more source locations. This means more precise error messages reported by the `babeltrace2` [2] CLI tool, for example. See `b7d2695a, `b30284d1, `2246e99da, and `1419db2bc. `source.ctf.fs` component class: Return an error when failing to load or create an index for a trace. See `29a8227a`. `filter.lttng-utils.debug-info` component class: Create stream classes and event classes which correspond to stream classes and event classes in incoming trace classes as soon as we encounter a new trace class, rather than as needed. See `db5d746d`. Python bindings: In `bt2.Graph.add_component()`, validate that the `params` parameter is implicitly convertible to a `bt2.Mapvalue` object (`dict`, for example). See `401b7022`. Bug fixes - General: Fix many issues reported by static analysis tools (scan-build, Coverity [3], and compiler warnings). `source.ctf.fs` component class: Fix an index handling bug which occured when multiple data stream files in a CTF trace belong to the same data stream. See `ce75de14`. `source.ctf.lttng-live` component class: Fix a bug in which a disappearing LTTng live tracing session could lead to an assertion failure. See `ba90bce7`. Python bindings: Fix an assertion failure which occurs when a trace or trace class destruction listener raises an exception. See `64961f8b`. Upcoming We are still hard at work putting the finishing touches on our way to the final 2.0.0 release. We invite you to try this release candidate and report any problems you may encounter to the `lttng-dev@lists.lttng.org` [4] mailing list or through the Babeltrace bug tracker [5]. Documentation - We are currently documenting the entire Babeltrace 2 C API to make the development of new component classes as easy as possible. We'll also work on the Python bindings documentation. The documentation of the API will be made available on the official Babeltrace website [6]. Other tasks --- * Improve test coverage. * Improve resilience to corrupted/malformed CTF traces. * Minor internal cleanups and bug fixes. Important links === Babeltrace 2.0.0-rc2 tarball: https://www.efficios.com/files/babeltrace/babeltrace-2.0.0-rc2.tar.bz2 Babeltrace website: https://diamon.org/babeltrace Mailing list (for support and development): `lttng-dev@lists.lttng.org` IRC channel: `#lttng` on `irc.oftc.net` Git repository: https://bugs.lttng.org/projects/babeltrace GitHub project: https://github.com/efficios/babeltrace Continuous integration: https://ci.lttng.org/view/Babeltrace/ Code review: https://review.lttng.org/q/project:babeltrace References == [1]: https://diamon.org/babeltrace/docs/release-notes/babeltrace-2.0.0-rc1-release-notes.html [2]: https://diamon.org/babeltrace/docs/v2.0/man1/babeltrace2.1/ [3]: https://scan.coverity.com/projects/babeltrace [4]: https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev [5]: https://bugs.lttng.org/projects/babeltrace [6]: https://diamon.org/babeltrace/ -- Jérémie Galarneau EfficiOS Inc. http://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev