Hi,

I can give you some background context: our CouchDB instance is running
on a embedded device (with minimal attack vector, so we have no pressure
to mitigate CVEs). CouchDB has been chosen because of its write append
and power fail safe property (and because of the easy scriptable
curl/json interface).

Currently there is a production system running on a SMB1 share (mounted
in a Linux host) which works well (at least for our uses cases). SMB1 is
not logner the default on the Windows remote side. And SMB2/3 has an
issue with opening a renamend but not closed filedescriptor. The
question is, wether we can solve this issue with minimal changes.

> 1. How did you verify that the gen_server:call/3 call never returns?
> 2. Do you get any pertinent lines (especially crashes) in your
>    couch.log?

by adding:

> +        ?LOG_DEBUG("before gen_server:call", []),
>          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
> +        ?LOG_DEBUG("after gen_server:call", []),

the log gives:

> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.391.0>] Compaction process 
> spawned for db "asdf"
> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.84.0>] New task status for 
> <0.391.0>: [{changes_done,1},
>                                                    {database,<<"asdf">>},
>                                                    {progress,100},
>                                                    {started_on,1677753384},
>                                                    {total_changes,1},
>                                                    {type,database_compaction},
>                                                    {updated_on,1677753384}]
> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] CouchDB swapping files 
> .../asdf.couch and .../asdf.couch.compact.
> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] before gen_server:call

then long time nothing...

refreshing the db in the futon web gui gives: no response

and the log continues with:

> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] ** Generic server 
> couch_compaction_daemon terminating
> ** Last message in was {'EXIT',<0.145.0>,
>                            {timeout,
>                                {gen_server,call,[couch_server,get_server]}}}
> ** When Server state == {state,<0.145.0>}
> ** Reason for termination ==
> ** {compaction_loop_died,
>        {timeout,{gen_server,call,[couch_server,get_server]}}}
> 
> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] {error_report,<0.31.0>,
>                      {<0.144.0>,crash_report,
>                       [[{initial_call,
>                          {couch_compaction_daemon,init,['Argument__1']}},
>                         {pid,<0.144.0>},
>                         {registered_name,couch_compaction_daemon},
>                         {error_info,
>                          {exit,
>                           {compaction_loop_died,
>                            {timeout,
>                             {gen_server,call,[couch_server,get_server]}}},
>                           [{gen_server,terminate,7,
>                             [{file,"gen_server.erl"},{line,804}]},
>                            {proc_lib,init_p_do_apply,3,
>                             [{file,"proc_lib.erl"},{line,237}]}]}},
...


> 3. Can you share your environment where you get to compile 1.6.1
>    successfully, so we can try and reproduce this?

I could prepare you a yocto setup to build a toolchain and packages for
an qemu/docker imgage, if you are familar with that build system...

> 4. Could it be that your SMB implementation doesn’t allow for opening
> and closing files in this quick succession (with our without a rename
> in the mix)?

For testing it desn't need to run on SMB share, the timeout issue
occures with the given fd-swap patch on a default (Linux) setup.

And a strace log does not show any underlying FS issues.


Best,
Stefan

Am 28.02.23 um 16:47 schrieb Jan Lehnardt:
> first off, CouchDB 1.6.1 is no longer supported by this project AND it
> has a long list of CVEs[1] against it. You REALLY should be operating
> on a newer version.
> 
> Secondly, just to understand your motivation: you think closing and
> opening the fds after the file:rename/2 call will make things work
> for your SMB operation?
> 
> If yes, the only think I could spot that is substantially different, is
> that the NewFd position is advanced implicitly by the underlying
> file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment,
> but I don’t know why that should block the gen server call, as that only
> does some refcounting updates[3]. While this includes stopping the
> gen_server[4], I don’t see how the Pid this operates on should be any
> different under your patch.
> 
> So:
> 
> 1. How did you verify that the gen_server:call/3 call never returns?
> 2. Do you get any pertinent lines (especially crashes) in your couch.log?
> 3. Can you share your environment where you get to compile 1.6.1
>    successfully, so we can try and reproduce this?
> 4. Could it be that your SMB implementation doesn’t allow for opening and
>    closing files in this quick succession (with our without a rename in
>    the mix)?
> 
> 
> [1]: https://docs.couchdb.org/en/stable/cve/index.html
> [2]: 
> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179
> [3]: 
> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130
> [4]: 
> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84
> 
> 
> Best
> Jan
> — 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 
> 24/7 Observation for your CouchDB Instances:
> https://opservatory.app
> 
> 
>> On 28. Feb 2023, at 10:19, Stefan Kral <stefan.k...@emlix.com> wrote:
>>
>> Hi,
>>
>> I'm experimenting with a CouchDB setup on a SMB mount point. I know this
>> is not supported, but I ran into a (maybe simple) problem I don't
>> understand. Maybe someone of you can give a hint easily (that would be
>> amazing).
>>
>> Given the following patch (I need to close/reopen the file descriptors
>> after renaming) for the function
>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176
>>
>>>  1 --- a/src/couchdb/couch_db_updater.erl
>>>  2 +++ b/src/couchdb/couch_db_updater.erl
>>>  3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, _From, 
>>> #db{filepath=Path}=Db) ->
>>>  4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>>>  5          couch_file:delete(RootDir, Filepath),
>>>  6          ok = file:rename(CompactFilepath, Filepath),
>>>  7 +
>>>  8 +        ok = couch_file:close(NewDb#db.updater_fd),
>>>  9 +        ok = couch_file:close(NewDb#db.fd),
>>> 10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>>> 11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>>> 12 +        SwappedDb = NewDb2#db{
>>> 13 +            fd = SwappedReaderFd,
>>> 14 +            updater_fd = SwappedFd
>>> 15 +        },
>>> 16 +        unlink(SwappedFd),
>>> 17          close_db(Db),
>>> 18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>>> 19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>>> 20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, 
>>> infinity),
>>> 21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>>> 22          ?LOG_INFO("Compaction for db \"~s\" completed.", [Db#db.name]),
>>
>> then the gen_server:call() of line 20 never returns.
>>
>> Is there a major issue with this approach or just a minor mistake in my
>> implementation?
>>
>>
>> Thank you for having a look,
>> Stefan
> 
> 

Reply via email to