[lttng-dev] [PATCH lttng-tools 0/9] lttng-tools fixes
This is a series of fixes to the lttng-tools tests to properly teardown on ctrl-c. It also changes the uses of select() to the lttng poll compat layer. Thanks, Mathieu Mathieu Desnoyers (9): Improve handling of test SIGTERM/SIGINT Fix: tests: error handling in high throughput limits test Fix: utils.sh: handle SIGPIPE Fix: test: utils.sh: exit from process on full_cleanup Cleanup: test: don't stop relayd twice tests: invoke full_cleanup from script trap handlers, use modprobe -r epoll/poll compat: expose interruptible API lttng-ctl: notifications: use epoll()/poll() instead of select() sessiond: use epoll()/poll() instead of select() src/bin/lttng-sessiond/lttng-sessiond.h| 2 +- src/bin/lttng-sessiond/main.c | 2 +- src/bin/lttng-sessiond/thread-utils.c | 39 ++--- src/common/compat/compat-epoll.c | 9 +- src/common/compat/compat-poll.c| 9 +- src/common/compat/poll.h | 12 +- src/lib/lttng-ctl/channel.c| 76 tests/regression/kernel/test_clock_override| 10 +- .../regression/kernel/test_rotation_destroy_flush | 7 +- tests/regression/tools/crash/test_crash| 3 +- tests/regression/tools/live/test_lttng_ust | 3 +- .../tools/notification/test_notification_kernel| 2 +- .../tools/notification/test_notification_multi_app | 2 +- .../tools/notification/test_notification_ust | 2 +- .../tools/streaming/test_high_throughput_limits| 33 +++- .../test_rotation_destroy_flush| 3 +- tests/stress/test_multi_sessions_per_uid_10app | 5 +- .../test_multi_sessions_per_uid_5app_streaming | 5 +- ...lti_sessions_per_uid_5app_streaming_kill_relayd | 5 +- tests/utils/utils.sh | 193 - 20 files changed, 276 insertions(+), 146 deletions(-) -- 2.11.0 ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [PATCH lttng-tools 9/9] sessiond: use epoll()/poll() instead of select()
The select(2) system call is an ancient ABI limited to processes containing at most FD_SETSIZE file descriptors overall (typically 1024). This select call will fail if the target file descriptor is above FD_SETSIZE in a session daemon containing many file descriptors. This is unlikely to happen in normal use given than sessiond_init_thread_quit_pipe() is called early by main(). Odd scenarios could trigger this, for instance if the parent process leaves a large number of file descriptors open, or if a library which allocates file descriptors is LD_PRELOADed with the sessiond. Never use select, use the lttng epoll/poll wrapper instead. This patch depends on "Change lttng_poll_wait behaviour of compat-poll to match compat-epoll" posted by Yannick Lamarre. Signed-off-by: Mathieu Desnoyers CC: Yannick Lamarre --- src/bin/lttng-sessiond/lttng-sessiond.h | 2 +- src/bin/lttng-sessiond/main.c | 2 +- src/bin/lttng-sessiond/thread-utils.c | 39 ++--- 3 files changed, 19 insertions(+), 24 deletions(-) diff --git a/src/bin/lttng-sessiond/lttng-sessiond.h b/src/bin/lttng-sessiond/lttng-sessiond.h index d321fced..6810e896 100644 --- a/src/bin/lttng-sessiond/lttng-sessiond.h +++ b/src/bin/lttng-sessiond/lttng-sessiond.h @@ -161,7 +161,7 @@ extern struct consumer_data kconsumer_data; int sessiond_init_thread_quit_pipe(void); int sessiond_check_thread_quit_pipe(int fd, uint32_t events); -int sessiond_wait_for_quit_pipe(unsigned int timeout_us); +int sessiond_wait_for_quit_pipe(int timeout_ms); int sessiond_notify_quit_pipe(void); void sessiond_close_quit_pipe(void); diff --git a/src/bin/lttng-sessiond/main.c b/src/bin/lttng-sessiond/main.c index 57324820..a6719a4d 100644 --- a/src/bin/lttng-sessiond/main.c +++ b/src/bin/lttng-sessiond/main.c @@ -1833,7 +1833,7 @@ int main(int argc, char **argv) */ /* Initiate teardown once activity occurs on the quit pipe. */ - sessiond_wait_for_quit_pipe(-1U); + sessiond_wait_for_quit_pipe(-1); stop_threads: /* diff --git a/src/bin/lttng-sessiond/thread-utils.c b/src/bin/lttng-sessiond/thread-utils.c index 16ae9d69..e1c02290 100644 --- a/src/bin/lttng-sessiond/thread-utils.c +++ b/src/bin/lttng-sessiond/thread-utils.c @@ -73,41 +73,36 @@ int sessiond_check_thread_quit_pipe(int fd, uint32_t events) * Returns 1 if the caller should quit, 0 if the timeout was reached, and * -1 if an error was encountered. */ -int sessiond_wait_for_quit_pipe(unsigned int timeout_us) +int sessiond_wait_for_quit_pipe(int timeout_ms) { int ret; - fd_set read_fds; - struct timeval timeout; - - FD_ZERO(&read_fds); - FD_SET(thread_quit_pipe[0], &read_fds); - memset(&timeout, 0, sizeof(timeout)); - timeout.tv_sec = timeout_us / USEC_PER_SEC; - timeout.tv_usec = timeout_us % USEC_PER_SEC; - - while (true) { - ret = select(thread_quit_pipe[0] + 1, &read_fds, NULL, NULL, - timeout_us != -1U ? &timeout : NULL); - if (ret < 0 && errno == EINTR) { - /* Retry on interrupt. */ - continue; - } else { - break; - } - } + struct lttng_poll_event events; + ret = lttng_poll_create(&events, 1, LTTNG_CLOEXEC); + if (ret < 0) { + PERROR("Failure in lttng_poll_create"); + return -1; + } + ret = lttng_poll_add(&events, thread_quit_pipe[0], LPOLLIN | LPOLLERR); + if (ret < 0) { + PERROR("Failure in lttng_poll_add"); + ret = -1; + goto end; + } + ret = lttng_poll_wait(&events, timeout_ms); if (ret > 0) { /* Should quit. */ ret = 1; } else if (ret < 0 && errno != EINTR) { /* Unknown error. */ - PERROR("Failed to select() thread quit pipe"); + PERROR("Failed to epoll()/poll() thread quit pipe"); ret = -1; } else { /* Timeout reached. */ ret = 0; } - +end: + lttng_poll_clean(&events); return ret; } -- 2.11.0 ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [PATCH lttng-tools 3/9] Fix: utils.sh: handle SIGPIPE
perl prove closes its child pipes before giving it a chance to execute the signal trap handler. This means the child will not be able to complete execution of the trap handler if that handler writes to stdout or stderr. Work-around this situation by redirecting stdin, stdout, and stderr to /dev/null if a SIGPIPE is caught. Signed-off-by: Mathieu Desnoyers --- tests/utils/utils.sh | 12 1 file changed, 12 insertions(+) diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh index d273b278..b8ac88c1 100644 --- a/tests/utils/utils.sh +++ b/tests/utils/utils.sh @@ -68,9 +68,21 @@ function full_cleanup () trap - SIGTERM && kill -- -$$ } +function null_pipes () +{ + exec 0>/dev/null + exec 1>/dev/null + exec 2>/dev/null +} trap full_cleanup SIGINT SIGTERM +# perl prove closes its child pipes before giving it a chance to run its +# signal trap handlers. Redirect pipes to /dev/null if SIGPIPE is caught +# to allow those trap handlers to proceed. + +trap null_pipes SIGPIPE + function print_ok () { # Check if we are a terminal -- 2.11.0 ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [PATCH lttng-tools 1/9] Improve handling of test SIGTERM/SIGINT
The current state of signal handling for test scripts is: on SIGTERM/SIGINT of the tests (e.g. a CTRL-C on the console), session daemon and relay daemon are killed with SIGKILL, thus leaking all their resources, and leaving lttng kernel modules loaded. Revamp the "stop" functions to take a signal number and a timeout as optional parameters. The default signal number is SIGTERM. The full_cleanup trap handler now tries to nicely kill relayd and sessiond (if they are present) with SIGTERM, and wait up to the user-configurable LTTNG_TEST_TEARDOWN_TIMEOUT environment variable (which has a default of 60s). Then, if there are still either relayd, sessiond, or consumerd present, it will SIGKILL them and wait for them to vanish. If it had to kill sessiond with SIGKILL, it will also explicitly try to unload the lttng modules with modprobe. This approach is inspired from sysv init script shutdown behavior. Signed-off-by: Mathieu Desnoyers --- tests/utils/utils.sh | 180 +++ 1 file changed, 137 insertions(+), 43 deletions(-) diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh index 94b3a3c4..d273b278 100644 --- a/tests/utils/utils.sh +++ b/tests/utils/utils.sh @@ -15,14 +15,12 @@ SESSIOND_BIN="lttng-sessiond" SESSIOND_MATCH=".*lttng-sess.*" -SESSIOND_PIDS="" RUNAS_BIN="lttng-runas" RUNAS_MATCH=".*lttng-runas.*" CONSUMERD_BIN="lttng-consumerd" CONSUMERD_MATCH=".*lttng-consumerd.*" RELAYD_BIN="lttng-relayd" RELAYD_MATCH=".*lttng-relayd.*" -RELAYD_PIDS="" LTTNG_BIN="lttng" BABELTRACE_BIN="babeltrace" OUTPUT_DEST=/dev/null @@ -48,11 +46,20 @@ export LTTNG_SESSIOND_PATH="/bin/true" source $TESTDIR/utils/tap/tap.sh +if [ -z $LTTNG_TEST_TEARDOWN_TIMEOUT ]; then + LTTNG_TEST_TEARDOWN_TIMEOUT=60 +fi + function full_cleanup () { - if [ -n "${SESSIOND_PIDS}" ] || [ -n "${RELAYD_PIDS}" ]; then - kill -9 ${SESSIOND_PIDS} ${RELAYD_PIDS} > /dev/null 2>&1 - fi + # Try to kill daemons gracefully + stop_lttng_relayd_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT + stop_lttng_sessiond_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT + + # If daemons are still present, forcibly kill them + stop_lttng_relayd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT + stop_lttng_sessiond_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT + stop_lttng_consumerd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT # Disable trap for SIGTERM since the following kill to the # pidgroup will be SIGTERM. Otherwise it loops. @@ -397,8 +404,6 @@ function start_lttng_relayd_opt() else pass "Start lttng-relayd (opt: $opt)" fi - - RELAYD_PIDS=$(pgrep $RELAYD_MATCH) } function start_lttng_relayd() @@ -414,29 +419,58 @@ function start_lttng_relayd_notap() function stop_lttng_relayd_opt() { local withtap=$1 + local signal=$2 + local timeout=$3 + local dtimeleft= + local fail=0 + local pids=$(pgrep $RELAYD_MATCH) - if [ $withtap -eq "1" ]; then - diag "Killing lttng-relayd (pid: $RELAYD_PIDS)" + if [ -n "$timeout" ]; then + dtimeleft=$(($timeout * 2)) + fi + + if [ -z "$signal" ]; then + signal="SIGTERM" fi - kill $RELAYD_PIDS 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST + + if [ -z "$pids" ]; then + if [ $withtap -eq "1" ]; then + pass "No relay daemon to kill" + fi + return 0 + fi + + diag "Killing (signal $signal) lttng-relayd (pid: $pids)" + + kill -s $signal $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST retval=$? if [ $? -eq 1 ]; then + fail=1 if [ $withtap -eq "1" ]; then fail "Kill relay daemon" fi - return 1 else out=1 while [ -n "$out" ]; do out=$(pgrep $RELAYD_MATCH) + if [ -n "$dtimeleft" ]; then + if [ $dtimeleft -lt 0 ]; then + out= + fail=1 + fi + dtimeleft=$(($dtimeleft - 1)) + fi sleep 0.5 done if [ $withtap -eq "1" ]; then - pass "Kill relay daemon" + if [ $fail -eq "0" ]; then + pass "Wait after kill relay daemon" + else + fail "Wait after kill relay daemon" + fi fi fi - RELAYD_PIDS="" return $retval } @@ -508,7 +542,6 @@ function start_lttng_sessiond_opt() ok $status "Start session daemon" fi fi - SESSIOND_PID
[lttng-dev] [PATCH lttng-tools 6/9] tests: invoke full_cleanup from script trap handlers, use modprobe -r
Scripts implementing their own trap handlers override the generic one provided by utils.sh (full_cleanup). Invoke it at the end of the handlers to provide the utils cleanup as well. Moreover, change use of "rmmod" to "modprobe -r", which is better in trap handlers because it does not print errors if the module was not loaded yet when the signal occurs. Signed-off-by: Mathieu Desnoyers --- tests/regression/kernel/test_clock_override| 10 +++--- tests/regression/kernel/test_rotation_destroy_flush| 7 +++ tests/regression/tools/crash/test_crash| 3 +-- tests/regression/tools/notification/test_notification_kernel | 2 +- .../regression/tools/notification/test_notification_multi_app | 2 +- tests/regression/tools/notification/test_notification_ust | 2 +- tests/regression/tools/streaming/test_high_throughput_limits | 1 - .../ust/rotation-destroy-flush/test_rotation_destroy_flush | 3 +-- tests/stress/test_multi_sessions_per_uid_10app | 5 ++--- tests/stress/test_multi_sessions_per_uid_5app_streaming| 5 ++--- .../test_multi_sessions_per_uid_5app_streaming_kill_relayd | 5 ++--- 11 files changed, 17 insertions(+), 28 deletions(-) diff --git a/tests/regression/kernel/test_clock_override b/tests/regression/kernel/test_clock_override index e19b77e1..1fbba771 100755 --- a/tests/regression/kernel/test_clock_override +++ b/tests/regression/kernel/test_clock_override @@ -49,11 +49,9 @@ source $TESTDIR/utils/utils.sh function signal_cleanup() { diag "*** Exiting ***" - rmmod lttng-test stop_lttng_sessiond - rmmod lttng-clock-plugin-test - rmmod lttng-clock - exit 1 + modprobe -r lttng-test lttng-clock-plugin-test lttng-clock + full_cleanup } function extract_clock_metadata() @@ -93,10 +91,8 @@ function test_clock_override_metadata() stop_lttng_tracing_ok $SESSION_NAME destroy_lttng_session_ok $SESSION_NAME - rmmod lttng-test stop_lttng_sessiond - rmmod lttng-clock-plugin-test - rmmod lttng-clock + modprobe -r lttng-test lttng-clock-plugin-test lttng-clock local TRACE_METADATA_FILE_PATH="$(find "$TRACE_PATH" -name metadata -type f)" local TRACE_METADATA_DIR="$(dirname "$TRACE_METADATA_FILE_PATH")" diff --git a/tests/regression/kernel/test_rotation_destroy_flush b/tests/regression/kernel/test_rotation_destroy_flush index 0b0b0ca7..03933a3a 100755 --- a/tests/regression/kernel/test_rotation_destroy_flush +++ b/tests/regression/kernel/test_rotation_destroy_flush @@ -39,9 +39,8 @@ source $TESTDIR/utils/utils.sh function signal_cleanup() { diag "*** Exiting ***" - rmmod lttng-test - stop_lttng_sessiond - exit 1 + modprobe -r lttng-test + full_cleanup } function enable_kernel_lttng_channel_size_limit () @@ -107,7 +106,7 @@ function test_rotation_destroy_flush_single() rm -rf $TRACE_PATH - rmmod lttng-test + modprobe -r lttng-test stop_lttng_sessiond } diff --git a/tests/regression/tools/crash/test_crash b/tests/regression/tools/crash/test_crash index 13909c1b..5bad16e5 100755 --- a/tests/regression/tools/crash/test_crash +++ b/tests/regression/tools/crash/test_crash @@ -392,8 +392,7 @@ function interrupt_cleanup() { diag "*** Cleaning-up test ***" stop_test_apps -stop_lttng_sessiond -exit 1 +full_cleanup } TESTS=( diff --git a/tests/regression/tools/notification/test_notification_kernel b/tests/regression/tools/notification/test_notification_kernel index e7368df2..cc6fc581 100755 --- a/tests/regression/tools/notification/test_notification_kernel +++ b/tests/regression/tools/notification/test_notification_kernel @@ -56,7 +56,7 @@ function kernel_event_generator state_file=$1 kernel_event_generator_suspended=0 trap kernel_event_generator_toogle_state SIGUSR1 - trap "exit" SIGTERM SIGINT EXIT + while (true); do if [[ $kernel_event_generator_suspended -eq "1" ]]; then touch $state_file diff --git a/tests/regression/tools/notification/test_notification_multi_app b/tests/regression/tools/notification/test_notification_multi_app index 7465a83f..51d94e4f 100755 --- a/tests/regression/tools/notification/test_notification_multi_app +++ b/tests/regression/tools/notification/test_notification_multi_app @@ -64,7 +64,7 @@ function kernel_event_generator state_file=$1 kernel_event_generator_suspended=0 trap kernel_event_generator_toogle_state SIGUSR1 - trap "exit" SIGTERM SIGINT + while (true); do if [[ $kernel_event_generator_suspended -eq "1" ]]; then touch $state_file diff --git a/tests/regression/tools/notification/test_notification_ust b/tests/regression/tools/notification/test_notification_ust index 8941e476..82f79a8e 100755 ---
[lttng-dev] [PATCH lttng-tools 8/9] lttng-ctl: notifications: use epoll()/poll() instead of select()
The select(2) system call is an ancient ABI limited to processes containing at most FD_SETSIZE file descriptors overall (typically 1024). Those notification APIs will fail if the target file descriptor is above FD_SETSIZE in a process containing many file descriptors. Never use select, use the lttng epoll/poll wrapper instead. This patch depends on "Change lttng_poll_wait behaviour of compat-poll to match compat-epoll" posted by Yannick Lamarre. Signed-off-by: Mathieu Desnoyers CC: Yannick Lamarre --- src/lib/lttng-ctl/channel.c | 76 +++-- 1 file changed, 45 insertions(+), 31 deletions(-) diff --git a/src/lib/lttng-ctl/channel.c b/src/lib/lttng-ctl/channel.c index 5271aa13..bcecc65f 100644 --- a/src/lib/lttng-ctl/channel.c +++ b/src/lib/lttng-ctl/channel.c @@ -26,8 +26,7 @@ #include #include #include "lttng-ctl-helper.h" -#include -#include +#include static int handshake(struct lttng_notification_channel *channel); @@ -211,7 +210,7 @@ lttng_notification_channel_get_next_notification( struct lttng_notification *notification = NULL; enum lttng_notification_channel_status status = LTTNG_NOTIFICATION_CHANNEL_STATUS_OK; - fd_set read_fds; + struct lttng_poll_event events; if (!channel || !_notification) { status = LTTNG_NOTIFICATION_CHANNEL_STATUS_INVALID; @@ -241,9 +240,9 @@ lttng_notification_channel_get_next_notification( } /* -* Block on select() instead of the message reception itself as the -* recvmsg() wrappers always restard on EINTR. We choose to wait -* using select() in order to: +* Block on interruptible epoll/poll() instead of the message reception +* itself as the recvmsg() wrappers always restart on EINTR. We choose +* to wait using interruptible epoll/poll() in order to: * 1) Return if a signal occurs, * 2) Not deal with partially received messages. * @@ -252,20 +251,28 @@ lttng_notification_channel_get_next_notification( * announced length, receive_message() will block on recvmsg() * and never return (even if a signal is received). */ - FD_ZERO(&read_fds); - FD_SET(channel->socket, &read_fds); - ret = select(channel->socket + 1, &read_fds, NULL, NULL, NULL); - if (ret == -1) { - status = errno == EINTR ? + ret = lttng_poll_create(&events, 1, LTTNG_CLOEXEC); + if (ret < 0) { + status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR; + goto end_unlock; + } + ret = lttng_poll_add(&events, channel->socket, LPOLLIN | LPOLLERR); + if (ret < 0) { + status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR; + goto end_clean_poll; + } + ret = lttng_poll_wait_interruptible(&events, -1); + if (ret <= 0) { + status = (ret == -1 && errno == EINTR) ? LTTNG_NOTIFICATION_CHANNEL_STATUS_INTERRUPTED : LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR; - goto end_unlock; + goto end_clean_poll; } ret = receive_message(channel); if (ret) { status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR; - goto end_unlock; + goto end_clean_poll; } switch (get_current_message_type(channel)) { @@ -274,7 +281,7 @@ lttng_notification_channel_get_next_notification( channel); if (!notification) { status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR; - goto end_unlock; + goto end_clean_poll; } break; case LTTNG_NOTIFICATION_CHANNEL_MESSAGE_TYPE_NOTIFICATION_DROPPED: @@ -284,9 +291,11 @@ lttng_notification_channel_get_next_notification( default: /* Protocol error. */ status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR; - goto end_unlock; + goto end_clean_poll; } +end_clean_poll: + lttng_poll_clean(&events); end_unlock: pthread_mutex_unlock(&channel->lock); *_notification = notification; @@ -387,11 +396,7 @@ lttng_notification_channel_has_pending_notification( int ret; enum lttng_notification_channel_status status = LTTNG_NOTIFICATION_CHANNEL_STATUS_OK; - fd_set read_fds; - struct timeval timeout; - - FD_ZERO(&read_fds); - memset(&timeout, 0, sizeof(timeout)); + struct lttng_poll_event events; if (!channel || !_notification_pending) { status = LTTNG_NOTIFICATION_CHANNEL_STATUS_INVALID; @@ -426,48 +431,57 @@ lttng_notification_channel_has_pending_notification( * message if we see data available on the socket. If the peer does * not respe
[lttng-dev] [PATCH lttng-tools 4/9] Fix: test: utils.sh: exit from process on full_cleanup
Signed-off-by: Mathieu Desnoyers --- tests/utils/utils.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh index b8ac88c1..42e78d25 100644 --- a/tests/utils/utils.sh +++ b/tests/utils/utils.sh @@ -66,6 +66,7 @@ function full_cleanup () # The '-' before the pid number ($$) indicates 'kill' to signal the # whole process group. trap - SIGTERM && kill -- -$$ + exit 1 } function null_pipes () -- 2.11.0 ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [PATCH lttng-tools 2/9] Fix: tests: error handling in high throughput limits test
Each individual call to "tc" should be checked for error, else we may fail to catch specific tc errors caused, for instance, by a kernel configuration that only contains some of the required class modules. Also, invoke the utils.sh full_cleanup function from the script-specific interrupt_cleanup trap handler rather than try to perform stopping of relayd and sessiond within the script. Signed-off-by: Mathieu Desnoyers --- .../tools/streaming/test_high_throughput_limits| 32 ++ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/tests/regression/tools/streaming/test_high_throughput_limits b/tests/regression/tools/streaming/test_high_throughput_limits index 32c3f1f2..68973c73 100755 --- a/tests/regression/tools/streaming/test_high_throughput_limits +++ b/tests/regression/tools/streaming/test_high_throughput_limits @@ -51,28 +51,47 @@ function set_bw_limit # parent qdisc (1:) will always limit us to the right max value dataportlimit=$((9*${ctrlportlimit})) + diag "Set bandwidth limits to ${limit}kbits, ${ctrlportlimit} for control and ${dataportlimit} for data" tc qdisc add dev $DEFAULT_IF root handle 1: htb default 15 >/dev/null 2>&1 + if [ $? -ne 0 ]; then + return 1 + fi # the total bandwidth is the limit set by the user tc class add dev $DEFAULT_IF parent 1: classid 1:1 htb rate ${limit}kbit ceil ${limit}kbit >/dev/null 2>&1 + if [ $? -ne 0 ]; then + return 1 + fi # 1/10 of the bandwidth guaranteed and traffic prioritized for the control port tc class add dev $DEFAULT_IF parent 1:1 classid 1:10 htb rate ${ctrlportlimit}kbit ceil ${limit}kbit prio 1 >/dev/null 2>&1 + if [ $? -ne 0 ]; then + return 1 + fi # 9/10 of the bandwidth guaranteed and can borrow up to the total bandwidth (if unused) tc class add dev $DEFAULT_IF parent 1:1 classid 1:11 htb rate ${dataportlimit}kbit ceil ${limit}kbit prio 2 >/dev/null 2>&1 + if [ $? -ne 0 ]; then + return 1 + fi # filter to assign control traffic to the 1:10 class tc filter add dev $DEFAULT_IF parent 1: protocol ip u32 match ip dport $SESSIOND_CTRL_PORT 0x flowid 1:10 >/dev/null 2>&1 + if [ $? -ne 0 ]; then + return 1 + fi # filter to assign data traffic to the 1:11 class tc filter add dev $DEFAULT_IF parent 1: protocol ip u32 match ip dport $SESSIOND_DATA_PORT 0x flowid 1:11 >/dev/null 2>&1 + if [ $? -ne 0 ]; then + return 1 + fi - ok $? "Set bandwidth limits to ${limit}kbits, ${ctrlportlimit} for control and ${dataportlimit} for data" + return 0 } function reset_bw_limit { tc qdisc del dev $DEFAULT_IF root >/dev/null 2>&1 - ok $? "Reset bandwith limits" + return $? } function create_lttng_session_with_uri @@ -148,9 +167,9 @@ function validate_event_count function interrupt_cleanup() { diag "*** Exiting ***" - stop_lttng_relayd - stop_lttng_sessiond reset_bw_limit + # invoke utils cleanup + full_cleanup exit 1 } @@ -168,8 +187,7 @@ skip $isroot "Root access is needed to set bandwith limits. Skipping all tests." { # Catch sigint and try to cleanup limits - trap interrupt_cleanup SIGTERM - trap interrupt_cleanup SIGINT + trap interrupt_cleanup SIGTERM SIGINT BW_LIMITS=(3200 1600 800 400 200 100 50 25) for BW in ${BW_LIMITS[@]}; @@ -177,6 +195,7 @@ skip $isroot "Root access is needed to set bandwith limits. Skipping all tests." diag "Test high-throughput with bandwidth limit set to ${BW}kbits" set_bw_limit $BW + ok $? "Setting bandwidth limit" start_lttng_sessiond start_lttng_relayd "-o $TRACE_PATH" @@ -185,5 +204,6 @@ skip $isroot "Root access is needed to set bandwith limits. Skipping all tests." stop_lttng_relayd stop_lttng_sessiond reset_bw_limit + ok $? "Reset bandwith limits" done } -- 2.11.0 ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [PATCH lttng-tools 7/9] epoll/poll compat: expose interruptible API
Some use of the epoll/poll wrapper require interruption by signals to make the poll call return -1, errno EINTR. Expose a new lttng_poll_wait_interruptible API for this purpose. Signed-off-by: Mathieu Desnoyers CC: Yannick Lamarre --- src/common/compat/compat-epoll.c | 9 + src/common/compat/compat-poll.c | 9 + src/common/compat/poll.h | 12 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/src/common/compat/compat-epoll.c b/src/common/compat/compat-epoll.c index 6a781c7a..7108b717 100644 --- a/src/common/compat/compat-epoll.c +++ b/src/common/compat/compat-epoll.c @@ -241,7 +241,7 @@ error: /* * Wait on epoll set. This is a blocking call of timeout value. */ -int compat_epoll_wait(struct lttng_poll_event *events, int timeout) +int compat_epoll_wait(struct lttng_poll_event *events, int timeout, int interruptible) { int ret; uint32_t new_size; @@ -273,10 +273,11 @@ int compat_epoll_wait(struct lttng_poll_event *events, int timeout) do { ret = epoll_wait(events->epfd, events->events, events->nb_fd, timeout); - } while (ret == -1 && errno == EINTR); + } while (!interruptible && ret == -1 && errno == EINTR); if (ret < 0) { - /* At this point, every error is fatal */ - PERROR("epoll_wait"); + if (errno != EINTR) { + PERROR("epoll_wait"); + } goto error; } diff --git a/src/common/compat/compat-poll.c b/src/common/compat/compat-poll.c index b45b39dc..cdb6f8b5 100644 --- a/src/common/compat/compat-poll.c +++ b/src/common/compat/compat-poll.c @@ -281,7 +281,7 @@ error: /* * Wait on poll() with timeout. Blocking call. */ -int compat_poll_wait(struct lttng_poll_event *events, int timeout) +int compat_poll_wait(struct lttng_poll_event *events, int timeout, int interruptible) { int ret; @@ -308,10 +308,11 @@ int compat_poll_wait(struct lttng_poll_event *events, int timeout) do { ret = poll(events->wait.events, events->wait.nb_fd, timeout); - } while (ret == -1 && errno == EINTR); + } while (!interruptible && ret == -1 && errno == EINTR); if (ret < 0) { - /* At this point, every error is fatal */ - PERROR("poll wait"); + if (errno != EINTR) { + PERROR("poll wait"); + } goto error; } diff --git a/src/common/compat/poll.h b/src/common/compat/poll.h index d4bd87f5..5400e5b1 100644 --- a/src/common/compat/poll.h +++ b/src/common/compat/poll.h @@ -152,9 +152,11 @@ static inline int compat_glibc_epoll_create(int size, int flags) * Wait on epoll set with the number of fd registered to the lttng_poll_event * data structure (events). */ -extern int compat_epoll_wait(struct lttng_poll_event *events, int timeout); +extern int compat_epoll_wait(struct lttng_poll_event *events, int timeout, int interruptible); #define lttng_poll_wait(events, timeout) \ - compat_epoll_wait(events, timeout) + compat_epoll_wait(events, timeout, 0) +#define lttng_poll_wait_interruptible(events, timeout) \ + compat_epoll_wait(events, timeout, 1) /* * Add a fd to the epoll set and resize the epoll_event structure if needed. @@ -334,9 +336,11 @@ extern int compat_poll_create(struct lttng_poll_event *events, int size); * Wait on poll(2) event with nb_fd registered to the lttng_poll_event data * structure. */ -extern int compat_poll_wait(struct lttng_poll_event *events, int timeout); +extern int compat_poll_wait(struct lttng_poll_event *events, int timeout, int interruptible); #define lttng_poll_wait(events, timeout) \ - compat_poll_wait(events, timeout) + compat_poll_wait(events, timeout, 0) +#define lttng_poll_wait_interruptible(events, timeout) \ + compat_poll_wait(events, timeout, 1) /* * Add the fd to the pollfd structure. Resize if needed. -- 2.11.0 ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [PATCH lttng-tools 5/9] Cleanup: test: don't stop relayd twice
Signed-off-by: Mathieu Desnoyers --- tests/regression/tools/live/test_lttng_ust | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/regression/tools/live/test_lttng_ust b/tests/regression/tools/live/test_lttng_ust index 06017d01..830fc783 100755 --- a/tests/regression/tools/live/test_lttng_ust +++ b/tests/regression/tools/live/test_lttng_ust @@ -34,7 +34,7 @@ TRACE_PATH=$(mktemp -d) DIR=$(readlink -f $TESTDIR) -NUM_TESTS=12 +NUM_TESTS=11 source $TESTDIR/utils/utils.sh @@ -84,5 +84,4 @@ stop_lttng_relayd test_custom_url -stop_lttng_relayd stop_lttng_sessiond -- 2.11.0 ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
[lttng-dev] [RELEASE] Babeltrace 2.0.0-pre5
Hi everyone! We are happy to announce the release of Babeltrace 2.0.0-pre5. What's new since pre4? -- - New features -- Trace merging It is now possible to merge CTF traces that share the same UUID to a single CTF trace. This is especially useful when combined with the tracing session rotation feature introduced by LTTng 2.11. For instance, it is possible to combine an arbitrary number of trace archives and merge them back together. Example: A tracing session configured to automatically rotate every two seconds will result in a large number of independent traces, as follows: /path/to/my_session-20190501-181356 └── archives ├── 20190501T181356-0400-20190501T181358-0400-1 │ └── ust │ └── uid │ └── 1000 │ └── 64-bit │ ├── channel0_0 │ ├── channel0_1 │ ├── channel0_2 │ ├── channel0_3 │ ├── index │ └── metadata ├── 20190501T181358-0400-20190501T181400-0400-2 │ └── ust │ └── uid │ └── 1000 │ └── 64-bit │ ├── channel0_0 │ ├── channel0_1 │ ├── channel0_2 │ ├── channel0_3 │ ├── index │ └── metadata ├── 20190501T181400-0400-20190501T181402-0400-3 │ └── ust │ └── uid │ └── 1000 │ └── 64-bit │ ├── channel0_0 │ ├── channel0_1 │ ├── channel0_2 │ ├── channel0_3 │ ├── index │ └── metadata ... With this feature, it is now possible to combine all, or a subset, of the archived trace chunks. The following example combines two of the archived trace chunks into a single trace. babeltrace [...]/archives/20190501T181358-0400-20190501T181400-0400-2 \ [...]archives/20190501T181402-0400-20190501T181404-0400-4 \ --output-format=ctf --output /tmp/merged-trace This command produces the following trace: /tmp/merged-trace └── myhostname └── my_session-20190501T181356-0400 └── ust └── uid └── 1000 └── 64-bit ├── channel0_0 ├── channel0_1 ├── channel0_2 ├── channel0_3 └── metadata Note that this command uses new trace environment fields which will be produced by the final release of LTTng 2.11. In the meantime, using the existing LTTng 2.11-rc1 release will result in a different output trace hierarchy. -- LTTng live "subscribe" mode It is now possible to configure the `src.ctf.lttng-live` component in "subscribe" mode. In this mode, the source will attempt to consume an LTTng relay daemon's session and retry periodically if the session does not exist. This is in contrast with the default behaviour which will return (with success) immediately if the session to consume does not exist on the LTTng relay daemon. The following command demonstrates the use of this new mode: babeltrace --component=src.ctf.lttng-live \ --url=net://relayhost/host/tgthost/my-session \ --params='session-not-found-action="continue"' Note that we are planning the addition of a dedicated option to the CLI by the time of the final release. -- CLI support for array component parameters (`--params`) Users of the CLI can now use array parameters when configuring a component. See the following example: babeltrace --component=src.my.class \ --params='tokens=[23, 17, [42, "meow"]], path="/path/to/trace"' -- CTF source can now recursively search paths (and path arrays) Using the feature above, it is now possible to provide a list of paths to the `src.ctf.fs` component and rely on the component to recursively open CTF traces it finds. See the following example: babeltrace --component=src.ctf.fs \ --params='paths=["trace1", "trace2", "trace3"]' - Performance improvements One of the primary focus areas of the pre5 release was optimization work to bring Babeltrace 2's CTF reader back to performance parity with Babeltrace 1.x and reduce the overhead of the graph processing infrastructure. As of this release, our benchmarks indicate that Babeltrace 2 is now slightly faster (~4%) than the latest Babeltrace 1 release with a dummy output. This work (and performance comparison) was the subject of a talk by Philippe Proulx at the latest Tracing Summit. The video [1] and slides [2] of this presentation are available and provide the rationale behind the biggest changes to the API. Beyond API changes, this work resulted in the addition of a new "developer mode". When Babeltrace 2 is built in this mode,