[lttng-dev] [PATCH lttng-tools 0/9] lttng-tools fixes

2019-05-03 Thread Mathieu Desnoyers
This is a series of fixes to the lttng-tools tests to properly
teardown on ctrl-c. It also changes the uses of select() to
the lttng poll compat layer.

Thanks,

Mathieu

Mathieu Desnoyers (9):
  Improve handling of test SIGTERM/SIGINT
  Fix: tests: error handling in high throughput limits test
  Fix: utils.sh: handle SIGPIPE
  Fix: test: utils.sh: exit from process on full_cleanup
  Cleanup: test: don't stop relayd twice
  tests: invoke full_cleanup from script trap handlers, use modprobe -r
  epoll/poll compat: expose interruptible API
  lttng-ctl: notifications: use epoll()/poll() instead of select()
  sessiond: use epoll()/poll() instead of select()

 src/bin/lttng-sessiond/lttng-sessiond.h|   2 +-
 src/bin/lttng-sessiond/main.c  |   2 +-
 src/bin/lttng-sessiond/thread-utils.c  |  39 ++---
 src/common/compat/compat-epoll.c   |   9 +-
 src/common/compat/compat-poll.c|   9 +-
 src/common/compat/poll.h   |  12 +-
 src/lib/lttng-ctl/channel.c|  76 
 tests/regression/kernel/test_clock_override|  10 +-
 .../regression/kernel/test_rotation_destroy_flush  |   7 +-
 tests/regression/tools/crash/test_crash|   3 +-
 tests/regression/tools/live/test_lttng_ust |   3 +-
 .../tools/notification/test_notification_kernel|   2 +-
 .../tools/notification/test_notification_multi_app |   2 +-
 .../tools/notification/test_notification_ust   |   2 +-
 .../tools/streaming/test_high_throughput_limits|  33 +++-
 .../test_rotation_destroy_flush|   3 +-
 tests/stress/test_multi_sessions_per_uid_10app |   5 +-
 .../test_multi_sessions_per_uid_5app_streaming |   5 +-
 ...lti_sessions_per_uid_5app_streaming_kill_relayd |   5 +-
 tests/utils/utils.sh   | 193 -
 20 files changed, 276 insertions(+), 146 deletions(-)

-- 
2.11.0

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH lttng-tools 9/9] sessiond: use epoll()/poll() instead of select()

2019-05-03 Thread Mathieu Desnoyers
The select(2) system call is an ancient ABI limited to processes
containing at most FD_SETSIZE file descriptors overall (typically
1024).

This select call will fail if the target file descriptor is above
FD_SETSIZE in a session daemon containing many file descriptors.
This is unlikely to happen in normal use given than
sessiond_init_thread_quit_pipe() is called early by main(). Odd
scenarios could trigger this, for instance if the parent process leaves
a large number of file descriptors open, or if a library which
allocates file descriptors is LD_PRELOADed with the sessiond.

Never use select, use the lttng epoll/poll wrapper instead.

This patch depends on "Change lttng_poll_wait behaviour of compat-poll
to match compat-epoll" posted by Yannick Lamarre.

Signed-off-by: Mathieu Desnoyers 
CC: Yannick Lamarre 
---
 src/bin/lttng-sessiond/lttng-sessiond.h |  2 +-
 src/bin/lttng-sessiond/main.c   |  2 +-
 src/bin/lttng-sessiond/thread-utils.c   | 39 ++---
 3 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/src/bin/lttng-sessiond/lttng-sessiond.h 
b/src/bin/lttng-sessiond/lttng-sessiond.h
index d321fced..6810e896 100644
--- a/src/bin/lttng-sessiond/lttng-sessiond.h
+++ b/src/bin/lttng-sessiond/lttng-sessiond.h
@@ -161,7 +161,7 @@ extern struct consumer_data kconsumer_data;
 
 int sessiond_init_thread_quit_pipe(void);
 int sessiond_check_thread_quit_pipe(int fd, uint32_t events);
-int sessiond_wait_for_quit_pipe(unsigned int timeout_us);
+int sessiond_wait_for_quit_pipe(int timeout_ms);
 int sessiond_notify_quit_pipe(void);
 void sessiond_close_quit_pipe(void);
 
diff --git a/src/bin/lttng-sessiond/main.c b/src/bin/lttng-sessiond/main.c
index 57324820..a6719a4d 100644
--- a/src/bin/lttng-sessiond/main.c
+++ b/src/bin/lttng-sessiond/main.c
@@ -1833,7 +1833,7 @@ int main(int argc, char **argv)
 */
 
/* Initiate teardown once activity occurs on the quit pipe. */
-   sessiond_wait_for_quit_pipe(-1U);
+   sessiond_wait_for_quit_pipe(-1);
 
 stop_threads:
/*
diff --git a/src/bin/lttng-sessiond/thread-utils.c 
b/src/bin/lttng-sessiond/thread-utils.c
index 16ae9d69..e1c02290 100644
--- a/src/bin/lttng-sessiond/thread-utils.c
+++ b/src/bin/lttng-sessiond/thread-utils.c
@@ -73,41 +73,36 @@ int sessiond_check_thread_quit_pipe(int fd, uint32_t events)
  * Returns 1 if the caller should quit, 0 if the timeout was reached, and
  * -1 if an error was encountered.
  */
-int sessiond_wait_for_quit_pipe(unsigned int timeout_us)
+int sessiond_wait_for_quit_pipe(int timeout_ms)
 {
int ret;
-   fd_set read_fds;
-   struct timeval timeout;
-
-   FD_ZERO(&read_fds);
-   FD_SET(thread_quit_pipe[0], &read_fds);
-   memset(&timeout, 0, sizeof(timeout));
-   timeout.tv_sec = timeout_us / USEC_PER_SEC;
-   timeout.tv_usec = timeout_us % USEC_PER_SEC;
-
-   while (true) {
-   ret = select(thread_quit_pipe[0] + 1, &read_fds, NULL, NULL,
-   timeout_us != -1U ? &timeout : NULL);
-   if (ret < 0 && errno == EINTR) {
-   /* Retry on interrupt. */
-   continue;
-   } else {
-   break;
-   }
-   }
+   struct lttng_poll_event events;
 
+   ret = lttng_poll_create(&events, 1, LTTNG_CLOEXEC);
+   if (ret < 0) {
+   PERROR("Failure in lttng_poll_create");
+   return -1;
+   }
+   ret = lttng_poll_add(&events, thread_quit_pipe[0], LPOLLIN | LPOLLERR);
+   if (ret < 0) {
+   PERROR("Failure in lttng_poll_add");
+   ret = -1;
+   goto end;
+   }
+   ret = lttng_poll_wait(&events, timeout_ms);
if (ret > 0) {
/* Should quit. */
ret = 1;
} else if (ret < 0 && errno != EINTR) {
/* Unknown error. */
-   PERROR("Failed to select() thread quit pipe");
+   PERROR("Failed to epoll()/poll() thread quit pipe");
ret = -1;
} else {
/* Timeout reached. */
ret = 0;
}
-
+end:
+   lttng_poll_clean(&events);
return ret;
 }
 
-- 
2.11.0

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH lttng-tools 3/9] Fix: utils.sh: handle SIGPIPE

2019-05-03 Thread Mathieu Desnoyers
perl prove closes its child pipes before giving it a chance to execute
the signal trap handler. This means the child will not be able to
complete execution of the trap handler if that handler writes to stdout
or stderr.

Work-around this situation by redirecting stdin, stdout, and stderr
to /dev/null if a SIGPIPE is caught.

Signed-off-by: Mathieu Desnoyers 
---
 tests/utils/utils.sh | 12 
 1 file changed, 12 insertions(+)

diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh
index d273b278..b8ac88c1 100644
--- a/tests/utils/utils.sh
+++ b/tests/utils/utils.sh
@@ -68,9 +68,21 @@ function full_cleanup ()
trap - SIGTERM && kill -- -$$
 }
 
+function null_pipes ()
+{
+   exec 0>/dev/null
+   exec 1>/dev/null
+   exec 2>/dev/null
+}
 
 trap full_cleanup SIGINT SIGTERM
 
+# perl prove closes its child pipes before giving it a chance to run its
+# signal trap handlers. Redirect pipes to /dev/null if SIGPIPE is caught
+# to allow those trap handlers to proceed.
+
+trap null_pipes SIGPIPE
+
 function print_ok ()
 {
# Check if we are a terminal
-- 
2.11.0

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH lttng-tools 1/9] Improve handling of test SIGTERM/SIGINT

2019-05-03 Thread Mathieu Desnoyers
The current state of signal handling for test scripts is: on
SIGTERM/SIGINT of the tests (e.g. a CTRL-C on the console), session
daemon and relay daemon are killed with SIGKILL, thus leaking all their
resources, and leaving lttng kernel modules loaded.

Revamp the "stop" functions to take a signal number and a timeout
as optional parameters. The default signal number is SIGTERM.

The full_cleanup trap handler now tries to nicely kill relayd and
sessiond (if they are present) with SIGTERM, and wait up to the
user-configurable LTTNG_TEST_TEARDOWN_TIMEOUT environment variable
(which has a default of 60s). Then, if there are still either relayd,
sessiond, or consumerd present, it will SIGKILL them and wait for
them to vanish. If it had to kill sessiond with SIGKILL, it will
also explicitly try to unload the lttng modules with modprobe.

This approach is inspired from sysv init script shutdown behavior.

Signed-off-by: Mathieu Desnoyers 
---
 tests/utils/utils.sh | 180 +++
 1 file changed, 137 insertions(+), 43 deletions(-)

diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh
index 94b3a3c4..d273b278 100644
--- a/tests/utils/utils.sh
+++ b/tests/utils/utils.sh
@@ -15,14 +15,12 @@
 
 SESSIOND_BIN="lttng-sessiond"
 SESSIOND_MATCH=".*lttng-sess.*"
-SESSIOND_PIDS=""
 RUNAS_BIN="lttng-runas"
 RUNAS_MATCH=".*lttng-runas.*"
 CONSUMERD_BIN="lttng-consumerd"
 CONSUMERD_MATCH=".*lttng-consumerd.*"
 RELAYD_BIN="lttng-relayd"
 RELAYD_MATCH=".*lttng-relayd.*"
-RELAYD_PIDS=""
 LTTNG_BIN="lttng"
 BABELTRACE_BIN="babeltrace"
 OUTPUT_DEST=/dev/null
@@ -48,11 +46,20 @@ export LTTNG_SESSIOND_PATH="/bin/true"
 
 source $TESTDIR/utils/tap/tap.sh
 
+if [ -z $LTTNG_TEST_TEARDOWN_TIMEOUT ]; then
+   LTTNG_TEST_TEARDOWN_TIMEOUT=60
+fi
+
 function full_cleanup ()
 {
-   if [ -n "${SESSIOND_PIDS}" ] || [ -n "${RELAYD_PIDS}" ]; then
-   kill -9 ${SESSIOND_PIDS} ${RELAYD_PIDS} > /dev/null 2>&1
-   fi
+   # Try to kill daemons gracefully
+   stop_lttng_relayd_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT
+   stop_lttng_sessiond_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT
+
+   # If daemons are still present, forcibly kill them
+   stop_lttng_relayd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
+   stop_lttng_sessiond_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
+   stop_lttng_consumerd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
 
# Disable trap for SIGTERM since the following kill to the
# pidgroup will be SIGTERM. Otherwise it loops.
@@ -397,8 +404,6 @@ function start_lttng_relayd_opt()
else
pass "Start lttng-relayd (opt: $opt)"
fi
-
-   RELAYD_PIDS=$(pgrep $RELAYD_MATCH)
 }
 
 function start_lttng_relayd()
@@ -414,29 +419,58 @@ function start_lttng_relayd_notap()
 function stop_lttng_relayd_opt()
 {
local withtap=$1
+   local signal=$2
+   local timeout=$3
+   local dtimeleft=
+   local fail=0
+   local pids=$(pgrep $RELAYD_MATCH)
 
-   if [ $withtap -eq "1" ]; then
-   diag "Killing lttng-relayd (pid: $RELAYD_PIDS)"
+   if [ -n "$timeout" ]; then
+   dtimeleft=$(($timeout * 2))
+   fi
+
+   if [ -z "$signal" ]; then
+   signal="SIGTERM"
fi
-   kill $RELAYD_PIDS 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
+
+   if [ -z "$pids" ]; then
+   if [ $withtap -eq "1" ]; then
+   pass "No relay daemon to kill"
+   fi
+   return 0
+   fi
+
+   diag "Killing (signal $signal) lttng-relayd (pid: $pids)"
+
+   kill -s $signal $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
retval=$?
 
if [ $? -eq 1 ]; then
+   fail=1
if [ $withtap -eq "1" ]; then
fail "Kill relay daemon"
fi
-   return 1
else
out=1
while [ -n "$out" ]; do
out=$(pgrep $RELAYD_MATCH)
+   if [ -n "$dtimeleft" ]; then
+   if [ $dtimeleft -lt 0 ]; then
+   out=
+   fail=1
+   fi
+   dtimeleft=$(($dtimeleft - 1))
+   fi
sleep 0.5
done
if [ $withtap -eq "1" ]; then
-   pass "Kill relay daemon"
+   if [ $fail -eq "0" ]; then
+   pass "Wait after kill relay daemon"
+   else
+   fail "Wait after kill relay daemon"
+   fi
fi
fi
-   RELAYD_PIDS=""
return $retval
 }
 
@@ -508,7 +542,6 @@ function start_lttng_sessiond_opt()
ok $status "Start session daemon"
fi
fi
-   SESSIOND_PID

[lttng-dev] [PATCH lttng-tools 6/9] tests: invoke full_cleanup from script trap handlers, use modprobe -r

2019-05-03 Thread Mathieu Desnoyers
Scripts implementing their own trap handlers override the generic
one provided by utils.sh (full_cleanup). Invoke it at the end of
the handlers to provide the utils cleanup as well.

Moreover, change use of "rmmod" to "modprobe -r", which is better
in trap handlers because it does not print errors if the module
was not loaded yet when the signal occurs.

Signed-off-by: Mathieu Desnoyers 
---
 tests/regression/kernel/test_clock_override| 10 +++---
 tests/regression/kernel/test_rotation_destroy_flush|  7 +++
 tests/regression/tools/crash/test_crash|  3 +--
 tests/regression/tools/notification/test_notification_kernel   |  2 +-
 .../regression/tools/notification/test_notification_multi_app  |  2 +-
 tests/regression/tools/notification/test_notification_ust  |  2 +-
 tests/regression/tools/streaming/test_high_throughput_limits   |  1 -
 .../ust/rotation-destroy-flush/test_rotation_destroy_flush |  3 +--
 tests/stress/test_multi_sessions_per_uid_10app |  5 ++---
 tests/stress/test_multi_sessions_per_uid_5app_streaming|  5 ++---
 .../test_multi_sessions_per_uid_5app_streaming_kill_relayd |  5 ++---
 11 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/tests/regression/kernel/test_clock_override 
b/tests/regression/kernel/test_clock_override
index e19b77e1..1fbba771 100755
--- a/tests/regression/kernel/test_clock_override
+++ b/tests/regression/kernel/test_clock_override
@@ -49,11 +49,9 @@ source $TESTDIR/utils/utils.sh
 function signal_cleanup()
 {
diag "*** Exiting ***"
-   rmmod lttng-test
stop_lttng_sessiond
-   rmmod lttng-clock-plugin-test
-   rmmod lttng-clock
-   exit 1
+   modprobe -r lttng-test lttng-clock-plugin-test lttng-clock
+   full_cleanup
 }
 
 function extract_clock_metadata()
@@ -93,10 +91,8 @@ function test_clock_override_metadata()
stop_lttng_tracing_ok $SESSION_NAME
destroy_lttng_session_ok $SESSION_NAME
 
-   rmmod lttng-test
stop_lttng_sessiond
-   rmmod lttng-clock-plugin-test
-   rmmod lttng-clock
+   modprobe -r lttng-test lttng-clock-plugin-test lttng-clock
 
local TRACE_METADATA_FILE_PATH="$(find "$TRACE_PATH" -name metadata 
-type f)"
local TRACE_METADATA_DIR="$(dirname "$TRACE_METADATA_FILE_PATH")"
diff --git a/tests/regression/kernel/test_rotation_destroy_flush 
b/tests/regression/kernel/test_rotation_destroy_flush
index 0b0b0ca7..03933a3a 100755
--- a/tests/regression/kernel/test_rotation_destroy_flush
+++ b/tests/regression/kernel/test_rotation_destroy_flush
@@ -39,9 +39,8 @@ source $TESTDIR/utils/utils.sh
 function signal_cleanup()
 {
diag "*** Exiting ***"
-   rmmod lttng-test
-   stop_lttng_sessiond
-   exit 1
+   modprobe -r lttng-test
+   full_cleanup
 }
 
 function enable_kernel_lttng_channel_size_limit ()
@@ -107,7 +106,7 @@ function test_rotation_destroy_flush_single()
 
rm -rf $TRACE_PATH
 
-   rmmod lttng-test
+   modprobe -r lttng-test
stop_lttng_sessiond
 }
 
diff --git a/tests/regression/tools/crash/test_crash 
b/tests/regression/tools/crash/test_crash
index 13909c1b..5bad16e5 100755
--- a/tests/regression/tools/crash/test_crash
+++ b/tests/regression/tools/crash/test_crash
@@ -392,8 +392,7 @@ function interrupt_cleanup()
 {
 diag "*** Cleaning-up test ***"
 stop_test_apps
-stop_lttng_sessiond
-exit 1
+full_cleanup
 }
 
 TESTS=(
diff --git a/tests/regression/tools/notification/test_notification_kernel 
b/tests/regression/tools/notification/test_notification_kernel
index e7368df2..cc6fc581 100755
--- a/tests/regression/tools/notification/test_notification_kernel
+++ b/tests/regression/tools/notification/test_notification_kernel
@@ -56,7 +56,7 @@ function kernel_event_generator
state_file=$1
kernel_event_generator_suspended=0
trap kernel_event_generator_toogle_state SIGUSR1
-   trap "exit" SIGTERM SIGINT EXIT
+
while (true); do
if [[ $kernel_event_generator_suspended -eq "1" ]]; then
touch $state_file
diff --git a/tests/regression/tools/notification/test_notification_multi_app 
b/tests/regression/tools/notification/test_notification_multi_app
index 7465a83f..51d94e4f 100755
--- a/tests/regression/tools/notification/test_notification_multi_app
+++ b/tests/regression/tools/notification/test_notification_multi_app
@@ -64,7 +64,7 @@ function kernel_event_generator
state_file=$1
kernel_event_generator_suspended=0
trap kernel_event_generator_toogle_state SIGUSR1
-   trap "exit" SIGTERM SIGINT
+
while (true); do
if [[ $kernel_event_generator_suspended -eq "1" ]]; then
touch $state_file
diff --git a/tests/regression/tools/notification/test_notification_ust 
b/tests/regression/tools/notification/test_notification_ust
index 8941e476..82f79a8e 100755
---

[lttng-dev] [PATCH lttng-tools 8/9] lttng-ctl: notifications: use epoll()/poll() instead of select()

2019-05-03 Thread Mathieu Desnoyers
The select(2) system call is an ancient ABI limited to processes
containing at most FD_SETSIZE file descriptors overall (typically
1024).

Those notification APIs will fail if the target file descriptor
is above FD_SETSIZE in a process containing many file descriptors.

Never use select, use the lttng epoll/poll wrapper instead.

This patch depends on "Change lttng_poll_wait behaviour of compat-poll
to match compat-epoll" posted by Yannick Lamarre.

Signed-off-by: Mathieu Desnoyers 
CC: Yannick Lamarre 
---
 src/lib/lttng-ctl/channel.c | 76 +++--
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/src/lib/lttng-ctl/channel.c b/src/lib/lttng-ctl/channel.c
index 5271aa13..bcecc65f 100644
--- a/src/lib/lttng-ctl/channel.c
+++ b/src/lib/lttng-ctl/channel.c
@@ -26,8 +26,7 @@
 #include 
 #include 
 #include "lttng-ctl-helper.h"
-#include 
-#include 
+#include 
 
 static
 int handshake(struct lttng_notification_channel *channel);
@@ -211,7 +210,7 @@ lttng_notification_channel_get_next_notification(
struct lttng_notification *notification = NULL;
enum lttng_notification_channel_status status =
LTTNG_NOTIFICATION_CHANNEL_STATUS_OK;
-   fd_set read_fds;
+   struct lttng_poll_event events;
 
if (!channel || !_notification) {
status = LTTNG_NOTIFICATION_CHANNEL_STATUS_INVALID;
@@ -241,9 +240,9 @@ lttng_notification_channel_get_next_notification(
}
 
/*
-* Block on select() instead of the message reception itself as the
-* recvmsg() wrappers always restard on EINTR. We choose to wait
-* using select() in order to:
+* Block on interruptible epoll/poll() instead of the message reception
+* itself as the recvmsg() wrappers always restart on EINTR. We choose
+* to wait using interruptible epoll/poll() in order to:
 *   1) Return if a signal occurs,
 *   2) Not deal with partially received messages.
 *
@@ -252,20 +251,28 @@ lttng_notification_channel_get_next_notification(
 * announced length, receive_message() will block on recvmsg()
 * and never return (even if a signal is received).
 */
-   FD_ZERO(&read_fds);
-   FD_SET(channel->socket, &read_fds);
-   ret = select(channel->socket + 1, &read_fds, NULL, NULL, NULL);
-   if (ret == -1) {
-   status = errno == EINTR ?
+   ret = lttng_poll_create(&events, 1, LTTNG_CLOEXEC);
+   if (ret < 0) {
+   status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
+   goto end_unlock;
+   }
+   ret = lttng_poll_add(&events, channel->socket, LPOLLIN | LPOLLERR);
+   if (ret < 0) {
+   status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
+   goto end_clean_poll;
+   }
+   ret = lttng_poll_wait_interruptible(&events, -1);
+   if (ret <= 0) {
+   status = (ret == -1 && errno == EINTR) ?
LTTNG_NOTIFICATION_CHANNEL_STATUS_INTERRUPTED :
LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-   goto end_unlock;
+   goto end_clean_poll;
}
 
ret = receive_message(channel);
if (ret) {
status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-   goto end_unlock;
+   goto end_clean_poll;
}
 
switch (get_current_message_type(channel)) {
@@ -274,7 +281,7 @@ lttng_notification_channel_get_next_notification(
channel);
if (!notification) {
status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-   goto end_unlock;
+   goto end_clean_poll;
}
break;
case LTTNG_NOTIFICATION_CHANNEL_MESSAGE_TYPE_NOTIFICATION_DROPPED:
@@ -284,9 +291,11 @@ lttng_notification_channel_get_next_notification(
default:
/* Protocol error. */
status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-   goto end_unlock;
+   goto end_clean_poll;
}
 
+end_clean_poll:
+   lttng_poll_clean(&events);
 end_unlock:
pthread_mutex_unlock(&channel->lock);
*_notification = notification;
@@ -387,11 +396,7 @@ lttng_notification_channel_has_pending_notification(
int ret;
enum lttng_notification_channel_status status =
LTTNG_NOTIFICATION_CHANNEL_STATUS_OK;
-   fd_set read_fds;
-   struct timeval timeout;
-
-   FD_ZERO(&read_fds);
-   memset(&timeout, 0, sizeof(timeout));
+   struct lttng_poll_event events;
 
if (!channel || !_notification_pending) {
status = LTTNG_NOTIFICATION_CHANNEL_STATUS_INVALID;
@@ -426,48 +431,57 @@ lttng_notification_channel_has_pending_notification(
 * message if we see data available on the socket. If the peer does
 * not respe

[lttng-dev] [PATCH lttng-tools 4/9] Fix: test: utils.sh: exit from process on full_cleanup

2019-05-03 Thread Mathieu Desnoyers
Signed-off-by: Mathieu Desnoyers 
---
 tests/utils/utils.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh
index b8ac88c1..42e78d25 100644
--- a/tests/utils/utils.sh
+++ b/tests/utils/utils.sh
@@ -66,6 +66,7 @@ function full_cleanup ()
# The '-' before the pid number ($$) indicates 'kill' to signal the
# whole process group.
trap - SIGTERM && kill -- -$$
+   exit 1
 }
 
 function null_pipes ()
-- 
2.11.0

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH lttng-tools 2/9] Fix: tests: error handling in high throughput limits test

2019-05-03 Thread Mathieu Desnoyers
Each individual call to "tc" should be checked for error, else we
may fail to catch specific tc errors caused, for instance, by a
kernel configuration that only contains some of the required
class modules.

Also, invoke the utils.sh full_cleanup function from the script-specific
interrupt_cleanup trap handler rather than try to perform stopping
of relayd and sessiond within the script.

Signed-off-by: Mathieu Desnoyers 
---
 .../tools/streaming/test_high_throughput_limits| 32 ++
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/tests/regression/tools/streaming/test_high_throughput_limits 
b/tests/regression/tools/streaming/test_high_throughput_limits
index 32c3f1f2..68973c73 100755
--- a/tests/regression/tools/streaming/test_high_throughput_limits
+++ b/tests/regression/tools/streaming/test_high_throughput_limits
@@ -51,28 +51,47 @@ function set_bw_limit
# parent qdisc (1:) will always limit us to the right max value
dataportlimit=$((9*${ctrlportlimit}))
 
+   diag "Set bandwidth limits to ${limit}kbits, ${ctrlportlimit} for 
control and ${dataportlimit} for data"
 
tc qdisc add dev $DEFAULT_IF root handle 1: htb default 15 >/dev/null 
2>&1
+   if [ $? -ne 0 ]; then
+   return 1
+   fi
 
# the total bandwidth is the limit set by the user
tc class add dev $DEFAULT_IF parent 1: classid 1:1 htb rate 
${limit}kbit ceil ${limit}kbit >/dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   return 1
+   fi
# 1/10 of the bandwidth guaranteed and traffic prioritized for the 
control port
tc class add dev $DEFAULT_IF parent 1:1 classid 1:10 htb rate 
${ctrlportlimit}kbit ceil ${limit}kbit prio 1 >/dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   return 1
+   fi
# 9/10 of the bandwidth guaranteed and can borrow up to the total 
bandwidth (if unused)
tc class add dev $DEFAULT_IF parent 1:1 classid 1:11 htb rate 
${dataportlimit}kbit ceil ${limit}kbit prio 2 >/dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   return 1
+   fi
 
# filter to assign control traffic to the 1:10 class
tc filter add dev $DEFAULT_IF parent 1: protocol ip u32 match ip dport 
$SESSIOND_CTRL_PORT 0x flowid 1:10 >/dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   return 1
+   fi
# filter to assign data traffic to the 1:11 class
tc filter add dev $DEFAULT_IF parent 1: protocol ip u32 match ip dport 
$SESSIOND_DATA_PORT 0x flowid 1:11 >/dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   return 1
+   fi
 
-   ok $? "Set bandwidth limits to ${limit}kbits, ${ctrlportlimit} for 
control and ${dataportlimit} for data"
+   return 0
 }
 
 function reset_bw_limit
 {
tc qdisc del dev $DEFAULT_IF root >/dev/null 2>&1
-   ok $? "Reset bandwith limits"
+   return $?
 }
 
 function create_lttng_session_with_uri
@@ -148,9 +167,9 @@ function validate_event_count
 function interrupt_cleanup()
 {
diag "*** Exiting ***"
-   stop_lttng_relayd
-   stop_lttng_sessiond
reset_bw_limit
+   # invoke utils cleanup
+   full_cleanup
exit 1
 }
 
@@ -168,8 +187,7 @@ skip $isroot "Root access is needed to set bandwith limits. 
Skipping all tests."
 {
 
# Catch sigint and try to cleanup limits
-   trap interrupt_cleanup SIGTERM
-   trap interrupt_cleanup SIGINT
+   trap interrupt_cleanup SIGTERM SIGINT
 
BW_LIMITS=(3200 1600 800 400 200 100 50 25)
for BW in ${BW_LIMITS[@]};
@@ -177,6 +195,7 @@ skip $isroot "Root access is needed to set bandwith limits. 
Skipping all tests."
diag "Test high-throughput with bandwidth limit set to 
${BW}kbits"
 
set_bw_limit $BW
+   ok $? "Setting bandwidth limit"
 
start_lttng_sessiond
start_lttng_relayd "-o $TRACE_PATH"
@@ -185,5 +204,6 @@ skip $isroot "Root access is needed to set bandwith limits. 
Skipping all tests."
stop_lttng_relayd
stop_lttng_sessiond
reset_bw_limit
+   ok $? "Reset bandwith limits"
done
 }
-- 
2.11.0

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH lttng-tools 7/9] epoll/poll compat: expose interruptible API

2019-05-03 Thread Mathieu Desnoyers
Some use of the epoll/poll wrapper require interruption
by signals to make the poll call return -1, errno EINTR.
Expose a new lttng_poll_wait_interruptible API for this
purpose.

Signed-off-by: Mathieu Desnoyers 
CC: Yannick Lamarre 
---
 src/common/compat/compat-epoll.c |  9 +
 src/common/compat/compat-poll.c  |  9 +
 src/common/compat/poll.h | 12 
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/common/compat/compat-epoll.c b/src/common/compat/compat-epoll.c
index 6a781c7a..7108b717 100644
--- a/src/common/compat/compat-epoll.c
+++ b/src/common/compat/compat-epoll.c
@@ -241,7 +241,7 @@ error:
 /*
  * Wait on epoll set. This is a blocking call of timeout value.
  */
-int compat_epoll_wait(struct lttng_poll_event *events, int timeout)
+int compat_epoll_wait(struct lttng_poll_event *events, int timeout, int 
interruptible)
 {
int ret;
uint32_t new_size;
@@ -273,10 +273,11 @@ int compat_epoll_wait(struct lttng_poll_event *events, 
int timeout)
 
do {
ret = epoll_wait(events->epfd, events->events, events->nb_fd, 
timeout);
-   } while (ret == -1 && errno == EINTR);
+   } while (!interruptible && ret == -1 && errno == EINTR);
if (ret < 0) {
-   /* At this point, every error is fatal */
-   PERROR("epoll_wait");
+   if (errno != EINTR) {
+   PERROR("epoll_wait");
+   }
goto error;
}
 
diff --git a/src/common/compat/compat-poll.c b/src/common/compat/compat-poll.c
index b45b39dc..cdb6f8b5 100644
--- a/src/common/compat/compat-poll.c
+++ b/src/common/compat/compat-poll.c
@@ -281,7 +281,7 @@ error:
 /*
  * Wait on poll() with timeout. Blocking call.
  */
-int compat_poll_wait(struct lttng_poll_event *events, int timeout)
+int compat_poll_wait(struct lttng_poll_event *events, int timeout, int 
interruptible)
 {
int ret;
 
@@ -308,10 +308,11 @@ int compat_poll_wait(struct lttng_poll_event *events, int 
timeout)
 
do {
ret = poll(events->wait.events, events->wait.nb_fd, timeout);
-   } while (ret == -1 && errno == EINTR);
+   } while (!interruptible && ret == -1 && errno == EINTR);
if (ret < 0) {
-   /* At this point, every error is fatal */
-   PERROR("poll wait");
+   if (errno != EINTR) {
+   PERROR("poll wait");
+   }
goto error;
}
 
diff --git a/src/common/compat/poll.h b/src/common/compat/poll.h
index d4bd87f5..5400e5b1 100644
--- a/src/common/compat/poll.h
+++ b/src/common/compat/poll.h
@@ -152,9 +152,11 @@ static inline int compat_glibc_epoll_create(int size, int 
flags)
  * Wait on epoll set with the number of fd registered to the lttng_poll_event
  * data structure (events).
  */
-extern int compat_epoll_wait(struct lttng_poll_event *events, int timeout);
+extern int compat_epoll_wait(struct lttng_poll_event *events, int timeout, int 
interruptible);
 #define lttng_poll_wait(events, timeout) \
-   compat_epoll_wait(events, timeout)
+   compat_epoll_wait(events, timeout, 0)
+#define lttng_poll_wait_interruptible(events, timeout) \
+   compat_epoll_wait(events, timeout, 1)
 
 /*
  * Add a fd to the epoll set and resize the epoll_event structure if needed.
@@ -334,9 +336,11 @@ extern int compat_poll_create(struct lttng_poll_event 
*events, int size);
  * Wait on poll(2) event with nb_fd registered to the lttng_poll_event data
  * structure.
  */
-extern int compat_poll_wait(struct lttng_poll_event *events, int timeout);
+extern int compat_poll_wait(struct lttng_poll_event *events, int timeout, int 
interruptible);
 #define lttng_poll_wait(events, timeout) \
-   compat_poll_wait(events, timeout)
+   compat_poll_wait(events, timeout, 0)
+#define lttng_poll_wait_interruptible(events, timeout) \
+   compat_poll_wait(events, timeout, 1)
 
 /*
  * Add the fd to the pollfd structure. Resize if needed.
-- 
2.11.0

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [PATCH lttng-tools 5/9] Cleanup: test: don't stop relayd twice

2019-05-03 Thread Mathieu Desnoyers
Signed-off-by: Mathieu Desnoyers 
---
 tests/regression/tools/live/test_lttng_ust | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/regression/tools/live/test_lttng_ust 
b/tests/regression/tools/live/test_lttng_ust
index 06017d01..830fc783 100755
--- a/tests/regression/tools/live/test_lttng_ust
+++ b/tests/regression/tools/live/test_lttng_ust
@@ -34,7 +34,7 @@ TRACE_PATH=$(mktemp -d)
 
 DIR=$(readlink -f $TESTDIR)
 
-NUM_TESTS=12
+NUM_TESTS=11
 
 source $TESTDIR/utils/utils.sh
 
@@ -84,5 +84,4 @@ stop_lttng_relayd
 
 test_custom_url
 
-stop_lttng_relayd
 stop_lttng_sessiond
-- 
2.11.0

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


[lttng-dev] [RELEASE] Babeltrace 2.0.0-pre5

2019-05-03 Thread Jérémie Galarneau
Hi everyone!

We are happy to announce the release of Babeltrace 2.0.0-pre5.

What's new since pre4?
--

- New features

-- Trace merging

It is now possible to merge CTF traces that share the same UUID to a
single CTF trace. This is especially useful when combined with the
tracing session rotation feature introduced by LTTng 2.11.

For instance, it is possible to combine an arbitrary number of trace
archives and merge them back together.

Example:

A tracing session configured to automatically rotate every two seconds
will result in a large number of independent traces, as follows:

/path/to/my_session-20190501-181356
└──  archives
├──  20190501T181356-0400-20190501T181358-0400-1
│   └──  ust
│   └──  uid
│   └──  1000
│   └──  64-bit
│   ├──  channel0_0
│   ├──  channel0_1
│   ├──  channel0_2
│   ├──  channel0_3
│   ├──  index
│   └──  metadata
├──  20190501T181358-0400-20190501T181400-0400-2
│   └──  ust
│   └──  uid
│   └──  1000
│   └──  64-bit
│   ├──  channel0_0
│   ├──  channel0_1
│   ├──  channel0_2
│   ├──  channel0_3
│   ├──  index
│   └──  metadata
├──  20190501T181400-0400-20190501T181402-0400-3
│   └──  ust
│   └──  uid
│   └──  1000
│   └──  64-bit
│   ├──  channel0_0
│   ├──  channel0_1
│   ├──  channel0_2
│   ├──  channel0_3
│   ├──  index
│   └──  metadata
...

With this feature, it is now possible to combine all, or a subset, of
the archived trace chunks. The following example combines two of the
archived trace chunks into a single trace.

babeltrace [...]/archives/20190501T181358-0400-20190501T181400-0400-2 \
   [...]archives/20190501T181402-0400-20190501T181404-0400-4 \
   --output-format=ctf --output /tmp/merged-trace

This command produces the following trace:

/tmp/merged-trace
└──  myhostname
└──  my_session-20190501T181356-0400
└──  ust
└──  uid
└──  1000
└──  64-bit
├──  channel0_0
├──  channel0_1
├──  channel0_2
├──  channel0_3
└──  metadata

Note that this command uses new trace environment fields which will be
produced by the final release of LTTng 2.11. In the meantime, using
the existing LTTng 2.11-rc1 release will result in a different output
trace hierarchy.

-- LTTng live "subscribe" mode

It is now possible to configure the `src.ctf.lttng-live` component in
"subscribe" mode. In this mode, the source will attempt to consume an
LTTng relay daemon's session and retry periodically if the session
does not exist.

This is in contrast with the default behaviour which will return (with
success) immediately if the session to consume does not exist on the
LTTng relay daemon.

The following command demonstrates the use of this new mode:

babeltrace --component=src.ctf.lttng-live \
   --url=net://relayhost/host/tgthost/my-session \
   --params='session-not-found-action="continue"'

Note that we are planning the addition of a dedicated option to the CLI by the
time of the final release.

-- CLI support for array component parameters (`--params`)

Users of the CLI can now use array parameters when configuring a
component. See the following example:

babeltrace --component=src.my.class \
   --params='tokens=[23, 17, [42, "meow"]], path="/path/to/trace"'

-- CTF source can now recursively search paths (and path arrays)

Using the feature above, it is now possible to provide a list of paths
to the `src.ctf.fs` component and rely on the component to recursively
open CTF traces it finds.

See the following example:
babeltrace --component=src.ctf.fs \
   --params='paths=["trace1", "trace2", "trace3"]'


- Performance improvements

One of the primary focus areas of the pre5 release was optimization
work to bring Babeltrace 2's CTF reader back to performance parity
with Babeltrace 1.x and reduce the overhead of the graph processing
infrastructure. As of this release, our benchmarks indicate that
Babeltrace 2 is now slightly faster (~4%) than the latest Babeltrace 1
release with a dummy output.

This work (and performance comparison) was the subject of a talk by
Philippe Proulx at the latest Tracing Summit. The video [1] and slides
[2] of this presentation are available and provide the rationale
behind the biggest changes to the API.

Beyond API changes, this work resulted in the addition of a new
"developer mode". When Babeltrace 2 is built in this mode,