Hi Pieter, On Sat, Aug 25, 2018 at 10:00:04PM +0200, PiBa-NL wrote: > Hi List, Thierry, Olivier, > > Using a lua-socket with connect_ssl and haproxy running with nbthread 3.. > results in haproxy hanging with 3 threads for me. > > This while using both 1.9-7/30 version (with the 2 extra patches from > Olivier avoiding 100% on a single thread.) and also a build of today's > snapshot: HA-Proxy version 1.9-dev1-e3faf02 2018/08/25 > > Below info is at the bottom of the mail: > - haproxy -vv > - gdb backtraces > > This one is easy to reproduce after just a few calls to the lua function > with the lua code i'm writing on a test-box.. So if a 'simple' config that > makes a reproduction is desired i can likely come up with one. > Same lua code with nbthread 1 seems to work properly. > > Is below info (the stack traces) enough to come up with a fix? If not lemme > know and ill try and make a small reproduction of it. > > > root@freebsd11:~ # haproxy -vv > HA-Proxy version 1.9-dev1-e3faf02 2018/08/25 > Copyright 2000-2018 Willy Tarreau <[email protected]> > > Build options : > TARGET = freebsd > CPU = generic > CC = cc > CFLAGS = -DDEBUG_THREAD -DDEBUG_MEMORY -pipe -g -fstack-protector > -fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement > -fwrapv -fno-strict-overflow -Wno-address-of-packed-member > -Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS -DFREEBSD_PORTS > OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1 > USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1 > > Default settings : > maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 > > Built with network namespace support. > Built with zlib version : 1.2.11 > Running on zlib version : 1.2.11 > Compression algorithms supported : identity("identity"), deflate("deflate"), > raw-deflate("deflate"), gzip("gzip") > Built with PCRE version : 8.40 2017-01-11 > Running on PCRE version : 8.40 2017-01-11 > PCRE library supports JIT : yes > Built with multi-threading support. > Encrypted password support via crypt(3): yes > Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY > Built with Lua version : Lua 5.3.4 > Built with OpenSSL version : OpenSSL 1.0.2k-freebsd 26 Jan 2017 > Running on OpenSSL version : OpenSSL 1.0.2k-freebsd 26 Jan 2017 > OpenSSL library supports TLS extensions : yes > OpenSSL library supports SNI : yes > OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 > > Available polling systems : > kqueue : pref=300, test result OK > poll : pref=200, test result OK > select : pref=150, test result OK > Total: 3 (3 usable), will use kqueue. > > Available multiplexer protocols : > (protocols markes as <default> cannot be specified using 'proto' keyword) > <default> : mode=TCP|HTTP side=FE|BE > h2 : mode=HTTP side=FE > > Available filters : > [TRACE] trace > [COMP] compression > [SPOE] spoe > > root@freebsd11:~ # /usr/local/bin/gdb81 --pid 39649 > GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD] > Copyright (C) 2018 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-portbld-freebsd11.1". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > For help, type "help". > Type "apropos word" to search for commands related to "word". > Attaching to process 39649 > Reading symbols from /usr/local/sbin/haproxy...done. > [New LWP 101651 of process 39649] > [New LWP 101652 of process 39649] > Reading symbols from /lib/libcrypt.so.5...(no debugging symbols > found)...done. > Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done. > Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done. > Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols > found)...done. > Reading symbols from /lib/libcrypto.so.8...(no debugging symbols > found)...done. > Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging symbols > found)...done. > Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done. > Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. > Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols > found)...done. > [Switching to LWP 101650 of process 39649] > 0x0000000801e11e3a in _kevent () from /lib/libc.so.7 > (gdb) info thread > Id Target Id Frame > * 1 LWP 101650 of process 39649 0x0000000801e11e3a in _kevent () from > /lib/libc.so.7 > 2 LWP 101651 of process 39649 0x0000000000437b92 in __spin_lock > (lbl=LUA_LOCK, l=0x8cf1d8 <hlua_global_lock>, func=0x62a781 > "hlua_ctx_resume", > file=0x62a328 "src/hlua.c", line=1070) at include/common/hathreads.h:731 > 3 LWP 101652 of process 39649 0x000000080187a70c in ?? () from > /usr/local/lib/liblua-5.3.so > (gdb) bt full > #0 0x0000000801e11e3a in _kevent () from /lib/libc.so.7 > No symbol table info available. > #1 0x0000000800f15ca2 in ?? () from /lib/libthr.so.3 > No symbol table info available. > #2 0x000000000041dc4a in _do_poll (p=0x8d0548 <cur_poller>, exp=1923638009) > at src/ev_kqueue.c:151 > status = 37937304 > count = 1923638009 > fd = 200 > delta_ms = 1000 > timeout = {tv_sec = 1, tv_nsec = 0} > updt_idx = 0 > changes = 0 > old_fd = -1 > #3 0x000000000051a975 in run_poll_loop () at src/haproxy.c:2419 > next = 1923638009 > exp = 1923638009 > #4 0x0000000000517672 in run_thread_poll_loop (data=0x8024849f0) at > src/haproxy.c:2451 > start_lock = {lock = 0, info = {owner = 0, waiters = 0, > last_location = {function = 0x0, file = 0x0, line = 0}}} > ptif = 0x8c1980 <per_thread_init_list> > ptdf = 0x800f1841e <pthread_sigmask+46> > #5 0x000000000051423e in main (argc=3, argv=0x7fffffffeb28) at > src/haproxy.c:3053 > tids = 0x8024849f0 > threads = 0x802510ac0 > i = 3 > old_sig = {__bits = {1073741824, 0, 0, 0}} > blocked_sig = {__bits = {4227856759, 4294967295, 4294967295, > 4294967295}} > err = 0 > retry = 200 > limit = {rlim_cur = 4033, rlim_max = 4033} > errmsg = > "\000\353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\063\207Z\306\357O5\347\240\211\214\000\000\000\000\000 > > \353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\300\352\377\377\377\177\000\000R\201\340\001\b\000\000\000\001\000\000" > pidfd = -1 > (gdb) thread 2 > [Switching to thread 2 (LWP 101651 of process 39649)] > #0 0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8 > <hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328 > "src/hlua.c", line=1070) > at include/common/hathreads.h:731 > 731 __SPIN_LOCK(&l->lock); > (gdb) bt full > #0 0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8 > <hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328 > "src/hlua.c", line=1070) > at include/common/hathreads.h:731 > __pl_l = 0x8cf1d8 <hlua_global_lock> > __pl_r = 4294967300 > start_time = 7797095086642562 > #1 0x0000000000438a5e in hlua_ctx_resume (lua=0x8024f4500, yield_allowed=1) > at src/hlua.c:1070 > ret = 9240032 > msg = 0x456171 <eb32sc_next_with_parent+97> "H\203\370" > trace = 0x7fffdfffdcb0 "" > #2 0x000000000044236d in hlua_process_task (task=0x80242d960, > context=0x8024f4500, state=513) at src/hlua.c:5641 > hlua = 0x8024f4500 > status = HLUA_E_OK > #3 0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384 > t = 0x80242d960 > state = 513 > ctx = 0x8024f4500 > process = 0x4422f0 <hlua_process_task> > t = 0x80242df00 > max_processed = 200 > #4 0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386 > next = 1923609561 > exp = 1923608777 > #5 0x0000000000517672 in run_thread_poll_loop (data=0x8024849f4) at > src/haproxy.c:2451 > start_lock = {lock = 0, info = {owner = 0, waiters = 0, > last_location = {function = 0x0, file = 0x0, line = 0}}} > ptif = 0x8c1980 <per_thread_init_list> > ptdf = 0x800f177cc > #6 0x0000000800f12bc5 in ?? () from /lib/libthr.so.3 > No symbol table info available. > #7 0x0000000000000000 in ?? () > No symbol table info available. > Backtrace stopped: Cannot access memory at address 0x7fffdfffe000 > (gdb) thread 3 > [Switching to thread 3 (LWP 101652 of process 39649)] > #0 0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so > (gdb) bt full > #0 0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so > No symbol table info available. > #1 0x000000080187acd7 in ?? () from /usr/local/lib/liblua-5.3.so > No symbol table info available. > #2 0x000000080187b108 in ?? () from /usr/local/lib/liblua-5.3.so > No symbol table info available. > #3 0x0000000801873e30 in lua_gc () from /usr/local/lib/liblua-5.3.so > No symbol table info available. > #4 0x0000000000438e45 in hlua_ctx_resume (lua=0x8024f4f80, yield_allowed=1) > at src/hlua.c:1186 > ret = 0 > msg = 0x5a8a24 <b_alloc_margin+324> "\351\200" > trace = 0x7fffdfdfcc60 "\240\314\337\337\377\177" > #5 0x000000000044887a in hlua_applet_http_fct (ctx=0x8024c5880) at > src/hlua.c:6716 > si = 0x802419540 > strm = 0x802419200 > res = 0x802419270 > rule = 0x80242e360 > px = 0x802512c00 > hlua = 0x8024f4f80 > blk1 = 0x7fffdfdfcca0 "" > len1 = 34397588737 > blk2 = 0x802419278 "" > len2 = 34397590136 > ret = 0 > #6 0x00000000005a78a7 in task_run_applet (t=0x80242efe0, > context=0x8024c5880, state=16385) at src/applet.c:49 > app = 0x8024c5880 > si = 0x802419540 > #7 0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384 > t = 0x80242efe0 > state = 16385 > ctx = 0x8024c5880 > process = 0x5a77f0 <task_run_applet> > t = 0x80242efe0 > max_processed = 200 > #8 0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386 > next = 1923608730 > exp = 1923608070 > #9 0x0000000000517672 in run_thread_poll_loop (data=0x8024849f8) at > src/haproxy.c:2451 > start_lock = {lock = 0, info = {owner = 0, waiters = 0, > last_location = {function = 0x0, file = 0x0, line = 0}}} > ptif = 0x8c1980 <per_thread_init_list> > ptdf = 0x800f177cc > ---Type <return> to continue, or q <return> to quit--- > #10 0x0000000800f12bc5 in ?? () from /lib/libthr.so.3 > No symbol table info available. > #11 0x0000000000000000 in ?? () > No symbol table info available. > Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000 > (gdb) > >
I found one minor problem with the LUA code (patch attached) but I kinda doubt this is your problem here. It seems one thread is stuck in lua_gc() while holding the global LUA lock, but I don't know enough about LUA to guess what is going on. As Fred said, it would be really nice to have a sample configuration to reproduce the problem. Regards, Olivier
>From a6269dc836b47daed507791e10c6feba35bbd356 Mon Sep 17 00:00:00 2001 From: Olivier Houchard <[email protected]> Date: Mon, 27 Aug 2018 12:59:14 +0200 Subject: [PATCH] MINOR: hlua: Don't call RESET_SAFE_LJMP if SET_SAFE_LJMP returns 0. If SET_SAFE_LJMP returns 0, the spinlock is already unlocked, and lua_atpanic is already set back to hlua_panic_safe, so there's no need to call RESET_SAFE_LJMP. --- src/hlua.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/hlua.c b/src/hlua.c index 65986473a..d516ee6d7 100644 --- a/src/hlua.c +++ b/src/hlua.c @@ -6391,7 +6391,6 @@ static int hlua_applet_tcp_init(struct appctx *ctx, struct proxy *px, struct str error = "critical error"; SEND_ERR(px, "Lua applet tcp '%s': %s.\n", ctx->rule->arg.hlua_rule->fcn.name, error); - RESET_SAFE_LJMP(hlua->T); return 0; } -- 2.14.3

