On 31.10.20 00:18, Scott Q. wrote:
I have implemented this change and the core dumps went away.

It appears so far that it's indeed the right fix.

Has this change been introduced in 2.3.11.3 ?

Thanks!

On Tuesday, 27/10/2020 at 08:20 John Fawcett wrote:

    On 22/10/2020 10:23, John Fawcett wrote:
    On 21/10/2020 19:00, John Fawcett wrote:
    On 21/10/2020 16:44, Patrik Peng wrote:
    On 16.10.20 18:34, Patrik Peng wrote:
    On 16.10.20 18:00, Scott Q. wrote:
    This reminds me, the way I was able to reproduce this
    consistently was by having large headers ( 100+ lines ).


    On Friday, 16/10/2020 at 11:49 Patrik Peng wrote:

        On 19.08.20 17:37, Josef 'Jeff' Sipek wrote:

        On Wed, Aug 19, 2020 at 17:03:57 +0200, Alessio Cecchi wrote:
        Hi,
        after the upgrade to Dovecot 2.3.11.3, from 2.3.10.1, I see frequently
        these errors from different users:
        It looks like this has been around for a while and you just got unlucky 
and
        started seeing this now.  Here's a quick & dirty patch that should fix 
this.
        If you can try it, let us know how it went.
        Jeff.
        diff --git a/src/plugins/fts-solr/solr-connection.c 
b/src/plugins/fts-solr/solr-connection.c
        index 
ae720b5e2870a852c1b6c440939e3c7c0fa72b5c..9d364f93e2cd1b716b9ab61bd39656a6c5b1ea04
 100644
        --- a/src/plugins/fts-solr/solr-connection.c
        +++ b/src/plugins/fts-solr/solr-connection.c
        @@ -103,7 +103,7 @@ int solr_connection_init(const struct 
fts_solr_settings *solr_set,
                        http_set.ssl = ssl_client_set;
                        http_set.debug = solr_set->debug;
                        http_set.rawlog_dir = solr_set->rawlog_dir;
        -               solr_http_client = http_client_init(&http_set);
        +               solr_http_client = http_client_init_private(&http_set);
                }
                *conn_r = conn;
        diff --git a/src/plugins/fts/fts-parser-tika.c 
b/src/plugins/fts/fts-parser-tika.c
        index 
a4b8b5c3034f57e22e77caa759c090da6b62f8ba..b8b57a350b9a710d101ac7ccbcc14560d415d905
 100644
        --- a/src/plugins/fts/fts-parser-tika.c
        +++ b/src/plugins/fts/fts-parser-tika.c
        @@ -77,7 +77,7 @@ tika_get_http_client_url(struct mail_user *user, 
struct http_url **http_url_r)
                        http_set.request_timeout_msecs = 60*1000;
                        http_set.ssl = &ssl_set;
                        http_set.debug = user->mail_debug;
        -               tika_http_client = http_client_init(&http_set);
        +               tika_http_client = http_client_init_private(&http_set);
                }
                *http_url_r = tuser->http_url;
                return 0;

        Greetings

        I'm also experiencing these issues while running Dovecot
        2.3.11.3 with Solr 8.6.3 on FreeBSD 11.4. As mentioned in
        a previous mail, the above patch is already applied to
        Dovecot's FreeBSD Port, confirmed by the patches being
        present in the portstree
        (https://svnweb.freebsd.org/ports/branches/2020Q3/mail/dovecot/files/).

        In a FreeBSD VM with the official image
        
(https://download.freebsd.org/ftp/releases/VM-IMAGES/12.1-RELEASE/amd64/Latest/)
        I compiled dovecot from git and was able to reproduce the
        error with the patch mentioned above applied and also
        without any patches at all. From these results i
        conclude, that neither the patches applied in FreeBSDs
        portstree or the patch above have any influence.

        I also managed to reproduce the same results on a Debian
        10 machine (also with and without the patch):

        doveadm(some.u...@example.com): Panic: file http-client-request.c: line 
1232 (http_client_request_send_more): assertion failed: (req->payload_input != 
NULL)
        doveadm(some.u...@example.com): Error: Raw backtrace: 
/usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(backtrace_append+0x42) [0x7f093f7fc3c2]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(backtrace_get+0x1e) [0x7f093f7fc4ce] -> 
/usr/local/lib/dovecot/libdovecot.so  <http://libdovecot.so>.0(+0xea341) [0x7f093f807341]
        -> /usr/local/lib/dovecot/libdovecot.so  <http://libdovecot.so>.0(+0xea381) 
[0x7f093f807381] -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(i_fatal+0) [0x7f093f75c074]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(http_client_request_send_more+0x378) [0x7f093f7a47a8]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(http_client_connection_output+0xe4) [0x7f093f7a90f4]
        -> /usr/local/lib/dovecot/libssl_iostream_openssl.so  
<http://openssl.so>(+0x8bff) [0x7f093ec71bff]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(+0x1148b0) [0x7f093f8318b0]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(io_loop_call_io+0x69) [0x7f093f820259]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(io_loop_handler_run_internal+0x11b) [0x7f093f821b6b]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(io_loop_handler_run+0x59) [0x7f093f820369]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(io_loop_run+0x38) [0x7f093f820598]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(+0x86d1e) [0x7f093f7a3d1e]
        -> /usr/local/lib/dovecot/libdovecot.so  
<http://libdovecot.so>.0(http_client_request_finish_payload+0x2e) [0x7f093f7a407e]
        -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so  
<http://plugin.so>(solr_connection_post_end+0x32) [0x7f093b8492c2]
        -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so  
<http://plugin.so>(+0x3a45) [0x7f093b844a45]
        -> /usr/local/lib/dovecot/lib20_fts_plugin.so  
<http://plugin.so>(+0x94cc) [0x7f093e1104cc]
        -> /usr/local/lib/dovecot/lib20_fts_plugin.so  
<http://plugin.so>(fts_backend_update_deinit+0x23) [0x7f093e110503]
        -> /usr/local/lib/dovecot/lib20_fts_plugin.so  
<http://plugin.so>(+0x10a9b) [0x7f093e117a9b]
        -> /usr/local/lib/dovecot/lib20_fts_plugin.so  
<http://plugin.so>(+0x119ca) [0x7f093e1189ca]
        -> /usr/local/lib/dovecot/libdovecot-storage.so  
<http://libdovecot-storage.so>.0(mailbox_transaction_commit_get_changes+0x56) 
[0x7f093fb16076]
        -> /usr/local/lib/dovecot/libdovecot-storage.so  
<http://libdovecot-storage.so>.0(mailbox_transaction_commit+0x1e) [0x7f093fb1615e]
        -> doveadm(+0x31370) [0x5607cfa1f370] -> doveadm(+0x2b2a8) 
[0x5607cfa192a8]
        -> doveadm(+0x2bfb2) [0x5607cfa19fb2] -> 
doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x215) [0x5607cfa1ae05]
        -> doveadm(doveadm_cmd_run_ver2+0x57c) [0x5607cfa2bbec] -> 
doveadm(doveadm_cmd_try_run_ver2+0x37) [0x5607cfa2bc37]
        -> doveadm(main+0x1d2) [0x5607cfa09492]
        Aborted

        During my tests I also did notice, that the error appears
        more often depending of mail size and amount of mails in
        a folder:

        Tested with: doveadm -v fts rescan -usome.u...@example.com  && doveadm 
-v index -usome.u...@example.com  '*'
        1 Mail in INBOX with 9KB -> Error appeared 0 out of 20 times
        1 Mail in INBOX with 136KB -> Error appeared 17 out of 20 times
        3 Mails in INBOX with 408KB -> Error appeared 12 out of 20 times
        20 Mails in INBOX with ~2MB -> Error appeared 0 out of 20 times

        Maybe this info helps anyone.

        Patrik

    Yeah, I read your mail and that's why I tested with different
    mail sizes.
    I did some more tests, one with large headers (around 700
    lines of a long header line) but small body:

    1 Mail in INBOX with 583KB -> Error appeared 5 out of 20 times

    and another with a large mail body but normal headers:

    1 Mail in INBOX with 585KB -> Error appeared 0 out of 20 times
    I guess this kinda confirms your guess, but interestingly
    shows less errors than my previous test with a large header
    136KB Mail.


    Patrik

    I did notice that changing the batch_size in

    fts_solr = url=http://solr.example.org:8983/solr/  soft_commit=yes 
batch_size=1000

    does have an influence in how often the error occurs. Setting
    it to 1 or some huge number like 10000 reduces the chances
    quite a bit but not completely and also causes lots of small ,
    or few but quite large requests to Solr, so its not a practical
    workaround.

    Whats the current state of this bug? A fix for it would be very
    welcome, as it causes some trouble in our setup.


    Regards

    Hi

    would someone care to post (preferably a link) to a test email
    with which the problem can be reproduced?

    thanks

    Joh

    I can confirm that this is an error that was introduced in
    2.3.11.3. It did not happen on 2.3.10. I see it occurring on a
    Centos 7 installation compiled from source. Between these two
    releases I didn't see changes on the fts-solr code, but I do see
    some changes on the lib-http code. I suspect that the changes
    done on lib-http have introduced this issue.

    One temporary solution might be to roll back the relevant commits
    that impacted lib-http and see if the problem resolves. Given the
    info in this thread about the records arriving on solr, I'm
    wondering if this is just a problem where the data has been split
    into more than one request, but the last one has no data left in it.

    John

    Hi

    I did a bit of tracing. From the albeit limited examples that I
    saw the panic and assert segfault always happens when 
    req->payload_finished is true. If that observation is correct in
    general, then the patch is simple, just put the test for
    payload_finished before the assert for payload_input and
    payload_output. Would anyone with the problem care to try it and
    give feedback?

    diff -ur dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c
    dovecot-2.3.11.3/src/lib-http/http-client-request.c
    --- dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c
    2020-08-12 14:20:41.000000000 +0200
    +++ dovecot-2.3.11.3/src/lib-http/http-client-request.c 2020-10-27
    13:06:09.352973130 +0100
    @@ -1229,12 +1229,12 @@
            const char *error;
            uoff_t offset;

    -       i_assert(req->payload_input != NULL);
    -       i_assert(req->payload_output != NULL);
    -
            if (req->payload_finished)
                    return http_client_request_finish_payload_out(req);

    +       i_assert(req->payload_input != NULL);
    +       i_assert(req->payload_output != NULL);
    +
            io_remove(&conn->io_req_payload);

            /* chunked ostream needs to write to the parent stream's
    buffer */

I can confirm the same behavior after applying these changes. No more core dumps so far.

Thanks as well :)

Reply via email to