Hi Pieter,

On Sat, Aug 25, 2018 at 10:00:04PM +0200, PiBa-NL wrote:
> Hi List, Thierry, Olivier,
> 
> Using a lua-socket with connect_ssl and haproxy running with nbthread 3..
> results in haproxy hanging with 3 threads for me.
> 
> This while using both 1.9-7/30 version (with the 2 extra patches from
> Olivier avoiding 100% on a single thread.) and also a build of today's
> snapshot: HA-Proxy version 1.9-dev1-e3faf02 2018/08/25
> 
> Below info is at the bottom of the mail:
> - haproxy -vv
> - gdb backtraces
> 
> This one is easy to reproduce after just a few calls to the lua function
> with the lua code i'm writing on a test-box.. So if a 'simple' config that
> makes a reproduction is desired i can likely come up with one.
> Same lua code with nbthread 1 seems to work properly.
> 
> Is below info (the stack traces) enough to come up with a fix? If not lemme
> know and ill try and make a small reproduction of it.
> 
> 
> root@freebsd11:~ # haproxy -vv
> HA-Proxy version 1.9-dev1-e3faf02 2018/08/25
> Copyright 2000-2018 Willy Tarreau <[email protected]>
> 
> Build options :
>   TARGET  = freebsd
>   CPU     = generic
>   CC      = cc
>   CFLAGS  = -DDEBUG_THREAD -DDEBUG_MEMORY -pipe -g -fstack-protector
> -fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement
> -fwrapv -fno-strict-overflow -Wno-address-of-packed-member
> -Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS -DFREEBSD_PORTS
>   OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1
> USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1
> 
> Default settings :
>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
> 
> Built with network namespace support.
> Built with zlib version : 1.2.11
> Running on zlib version : 1.2.11
> Compression algorithms supported : identity("identity"), deflate("deflate"),
> raw-deflate("deflate"), gzip("gzip")
> Built with PCRE version : 8.40 2017-01-11
> Running on PCRE version : 8.40 2017-01-11
> PCRE library supports JIT : yes
> Built with multi-threading support.
> Encrypted password support via crypt(3): yes
> Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
> Built with Lua version : Lua 5.3.4
> Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
> Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
> 
> Available polling systems :
>      kqueue : pref=300,  test result OK
>        poll : pref=200,  test result OK
>      select : pref=150,  test result OK
> Total: 3 (3 usable), will use kqueue.
> 
> Available multiplexer protocols :
> (protocols markes as <default> cannot be specified using 'proto' keyword)
>        <default> : mode=TCP|HTTP   side=FE|BE
>               h2 : mode=HTTP       side=FE
> 
> Available filters :
>         [TRACE] trace
>         [COMP] compression
>         [SPOE] spoe
> 
> root@freebsd11:~ # /usr/local/bin/gdb81 --pid 39649
> GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-portbld-freebsd11.1".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word".
> Attaching to process 39649
> Reading symbols from /usr/local/sbin/haproxy...done.
> [New LWP 101651 of process 39649]
> [New LWP 101652 of process 39649]
> Reading symbols from /lib/libcrypt.so.5...(no debugging symbols
> found)...done.
> Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
> Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
> Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols
> found)...done.
> Reading symbols from /lib/libcrypto.so.8...(no debugging symbols
> found)...done.
> Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging symbols
> found)...done.
> Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
> Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
> Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols
> found)...done.
> [Switching to LWP 101650 of process 39649]
> 0x0000000801e11e3a in _kevent () from /lib/libc.so.7
> (gdb) info thread
>   Id   Target Id         Frame
> * 1    LWP 101650 of process 39649 0x0000000801e11e3a in _kevent () from
> /lib/libc.so.7
>   2    LWP 101651 of process 39649 0x0000000000437b92 in __spin_lock
> (lbl=LUA_LOCK, l=0x8cf1d8 <hlua_global_lock>, func=0x62a781
> "hlua_ctx_resume",
>     file=0x62a328 "src/hlua.c", line=1070) at include/common/hathreads.h:731
>   3    LWP 101652 of process 39649 0x000000080187a70c in ?? () from
> /usr/local/lib/liblua-5.3.so
> (gdb) bt full
> #0  0x0000000801e11e3a in _kevent () from /lib/libc.so.7
> No symbol table info available.
> #1  0x0000000800f15ca2 in ?? () from /lib/libthr.so.3
> No symbol table info available.
> #2  0x000000000041dc4a in _do_poll (p=0x8d0548 <cur_poller>, exp=1923638009)
> at src/ev_kqueue.c:151
>         status = 37937304
>         count = 1923638009
>         fd = 200
>         delta_ms = 1000
>         timeout = {tv_sec = 1, tv_nsec = 0}
>         updt_idx = 0
>         changes = 0
>         old_fd = -1
> #3  0x000000000051a975 in run_poll_loop () at src/haproxy.c:2419
>         next = 1923638009
>         exp = 1923638009
> #4  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f0) at
> src/haproxy.c:2451
>         start_lock = {lock = 0, info = {owner = 0, waiters = 0,
> last_location = {function = 0x0, file = 0x0, line = 0}}}
>         ptif = 0x8c1980 <per_thread_init_list>
>         ptdf = 0x800f1841e <pthread_sigmask+46>
> #5  0x000000000051423e in main (argc=3, argv=0x7fffffffeb28) at
> src/haproxy.c:3053
>         tids = 0x8024849f0
>         threads = 0x802510ac0
>         i = 3
>         old_sig = {__bits = {1073741824, 0, 0, 0}}
>         blocked_sig = {__bits = {4227856759, 4294967295, 4294967295,
> 4294967295}}
>         err = 0
>         retry = 200
>         limit = {rlim_cur = 4033, rlim_max = 4033}
>         errmsg = 
> "\000\353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\063\207Z\306\357O5\347\240\211\214\000\000\000\000\000
>  
> \353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\300\352\377\377\377\177\000\000R\201\340\001\b\000\000\000\001\000\000"
>         pidfd = -1
> (gdb) thread 2
> [Switching to thread 2 (LWP 101651 of process 39649)]
> #0  0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8
> <hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328
> "src/hlua.c", line=1070)
>     at include/common/hathreads.h:731
> 731             __SPIN_LOCK(&l->lock);
> (gdb) bt full
> #0  0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8
> <hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328
> "src/hlua.c", line=1070)
>     at include/common/hathreads.h:731
>         __pl_l = 0x8cf1d8 <hlua_global_lock>
>         __pl_r = 4294967300
>         start_time = 7797095086642562
> #1  0x0000000000438a5e in hlua_ctx_resume (lua=0x8024f4500, yield_allowed=1)
> at src/hlua.c:1070
>         ret = 9240032
>         msg = 0x456171 <eb32sc_next_with_parent+97> "H\203\370"
>         trace = 0x7fffdfffdcb0 ""
> #2  0x000000000044236d in hlua_process_task (task=0x80242d960,
> context=0x8024f4500, state=513) at src/hlua.c:5641
>         hlua = 0x8024f4500
>         status = HLUA_E_OK
> #3  0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384
>         t = 0x80242d960
>         state = 513
>         ctx = 0x8024f4500
>         process = 0x4422f0 <hlua_process_task>
>         t = 0x80242df00
>         max_processed = 200
> #4  0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386
>         next = 1923609561
>         exp = 1923608777
> #5  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f4) at
> src/haproxy.c:2451
>         start_lock = {lock = 0, info = {owner = 0, waiters = 0,
> last_location = {function = 0x0, file = 0x0, line = 0}}}
>         ptif = 0x8c1980 <per_thread_init_list>
>         ptdf = 0x800f177cc
> #6  0x0000000800f12bc5 in ?? () from /lib/libthr.so.3
> No symbol table info available.
> #7  0x0000000000000000 in ?? ()
> No symbol table info available.
> Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
> (gdb) thread 3
> [Switching to thread 3 (LWP 101652 of process 39649)]
> #0  0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so
> (gdb) bt full
> #0  0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so
> No symbol table info available.
> #1  0x000000080187acd7 in ?? () from /usr/local/lib/liblua-5.3.so
> No symbol table info available.
> #2  0x000000080187b108 in ?? () from /usr/local/lib/liblua-5.3.so
> No symbol table info available.
> #3  0x0000000801873e30 in lua_gc () from /usr/local/lib/liblua-5.3.so
> No symbol table info available.
> #4  0x0000000000438e45 in hlua_ctx_resume (lua=0x8024f4f80, yield_allowed=1)
> at src/hlua.c:1186
>         ret = 0
>         msg = 0x5a8a24 <b_alloc_margin+324> "\351\200"
>         trace = 0x7fffdfdfcc60 "\240\314\337\337\377\177"
> #5  0x000000000044887a in hlua_applet_http_fct (ctx=0x8024c5880) at
> src/hlua.c:6716
>         si = 0x802419540
>         strm = 0x802419200
>         res = 0x802419270
>         rule = 0x80242e360
>         px = 0x802512c00
>         hlua = 0x8024f4f80
>         blk1 = 0x7fffdfdfcca0 ""
>         len1 = 34397588737
>         blk2 = 0x802419278 ""
>         len2 = 34397590136
>         ret = 0
> #6  0x00000000005a78a7 in task_run_applet (t=0x80242efe0,
> context=0x8024c5880, state=16385) at src/applet.c:49
>         app = 0x8024c5880
>         si = 0x802419540
> #7  0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384
>         t = 0x80242efe0
>         state = 16385
>         ctx = 0x8024c5880
>         process = 0x5a77f0 <task_run_applet>
>         t = 0x80242efe0
>         max_processed = 200
> #8  0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386
>         next = 1923608730
>         exp = 1923608070
> #9  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f8) at
> src/haproxy.c:2451
>         start_lock = {lock = 0, info = {owner = 0, waiters = 0,
> last_location = {function = 0x0, file = 0x0, line = 0}}}
>         ptif = 0x8c1980 <per_thread_init_list>
>         ptdf = 0x800f177cc
> ---Type <return> to continue, or q <return> to quit---
> #10 0x0000000800f12bc5 in ?? () from /lib/libthr.so.3
> No symbol table info available.
> #11 0x0000000000000000 in ?? ()
> No symbol table info available.
> Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000
> (gdb)
> 
> 

I found one minor problem with the LUA code (patch attached) but I kinda
doubt this is your problem here.
It seems one thread is stuck in lua_gc() while holding the global LUA lock,
but I don't know enough about LUA to guess what is going on.
As Fred said, it would be really nice to have a sample configuration to
reproduce the problem.

Regards,

Olivier
>From a6269dc836b47daed507791e10c6feba35bbd356 Mon Sep 17 00:00:00 2001
From: Olivier Houchard <[email protected]>
Date: Mon, 27 Aug 2018 12:59:14 +0200
Subject: [PATCH] MINOR: hlua: Don't call RESET_SAFE_LJMP if SET_SAFE_LJMP
 returns 0.

If SET_SAFE_LJMP returns 0, the spinlock is already unlocked, and lua_atpanic
is already set back to hlua_panic_safe, so there's no need to call
RESET_SAFE_LJMP.
---
 src/hlua.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/hlua.c b/src/hlua.c
index 65986473a..d516ee6d7 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -6391,7 +6391,6 @@ static int hlua_applet_tcp_init(struct appctx *ctx, 
struct proxy *px, struct str
                        error = "critical error";
                SEND_ERR(px, "Lua applet tcp '%s': %s.\n",
                         ctx->rule->arg.hlua_rule->fcn.name, error);
-               RESET_SAFE_LJMP(hlua->T);
                return 0;
        }
 
-- 
2.14.3

Reply via email to