> On 24 Dec 2017, at 19:07, Claudio Jeker <cje...@diehard.n-r-g.com> wrote: > On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote: >>> On 23 Dec 2017, at 13:08, Claudio Jeker <cje...@diehard.n-r-g.com> wrote: >>>> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote: >>>> Hi All, >>>> >>>> Since OpenBSD 6.2, just confirmed this in the latest snapshot >>>> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic >>>> and starts flooding the log file with the following message: >>>> >>>> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout >>>> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout >>>> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout >>>> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout >>>> [snip] >>>> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout >>>> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout >>>> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout >>>> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout >>>> ...etc... >>>> >>>> Restarting the daemon "fixes" the problem. >>>> Not sure how to trouble shoot this but I am able to reproduce this >>>> consistently by pointing SSLLabs towards relayd. >>>> Would be great to get some pointers. >>>> >>> >>> I have seen this as well on our production systems. This is a problem in >>> the privsep part of the TLS code. I could not do more testing yet but my >>> assumption is that a new option / feature is freaking this code out. >> >> Anything I can do or collect to give you more information? > > So, I think I found the problem. The ca process did not handle errors from > RSA_private_encrypt correctly. So once you got a bad signature in the > system chocked and stopped. This diff seems to work for me (against > SSLlabs).
Awesome! Can confirm that it continues processing traffic when hitting it with sslabs. Will also move it to a more bussier machine to see how that handles. I am seeing the following messages now: Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02 Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing Mischa > > Cheers > -- > :wq Claudio > > Index: ca.c > =================================================================== > RCS file: /cvs/src/usr.sbin/relayd/ca.c,v > retrieving revision 1.31 > diff -u -p -r1.31 ca.c > --- ca.c 28 Nov 2017 00:20:23 -0000 1.31 > +++ ca.c 24 Dec 2017 18:01:20 -0000 > @@ -266,9 +266,15 @@ ca_dispatch_relay(int fd, struct privsep > break; > } > > + if (cko.cko_tlen == -1) { > + char buf[256]; > + log_warnx("%s: %s", __func__, > + ERR_error_string(ERR_get_error(), buf)); > + } > + > iov[c].iov_base = &cko; > iov[c++].iov_len = sizeof(cko); > - if (cko.cko_tlen) { > + if (cko.cko_tlen > 0) { > iov[c].iov_base = to; > iov[c++].iov_len = cko.cko_tlen; > } > @@ -381,12 +387,12 @@ rsae_send_imsg(int flen, const u_char *f > > IMSG_SIZE_CHECK(&imsg, (&cko)); > memcpy(&cko, imsg.data, sizeof(cko)); > - if (IMSG_DATA_SIZE(&imsg) != > - (sizeof(cko) + cko.cko_tlen)) > - fatalx("data size"); > > ret = cko.cko_tlen; > - if (ret) { > + if (ret > 0) { > + if (IMSG_DATA_SIZE(&imsg) != > + (sizeof(cko) + ret)) > + fatalx("data size"); > toptr = (u_char *)imsg.data + sizeof(cko); > memcpy(to, toptr, ret); > } >