Hi Jan,

here you go: https://github.com/emlix/couchdb-yocto

the mentioned patch is here
https://github.com/emlix/couchdb-yocto/blob/main/meta-couchdb/recipes-core/couchdb/files/0001-swap-fds.patch

when you run the comaction test (see README do get there)
/usr/lib/test-couchdb/test-compaction.sh

you will find in the (/var/log/couchdb/couch.log) log as last line:
[debug] [<0.173.0>] before gen_server:call

Thanks,
Stefan

Am 02.03.23 um 13:45 schrieb Jan Lehnardt:
> Hi Stefan,
> 
> Thanks for the additional info. I’m happy to try a yocto build here.
> 
> Best
> Jan
> —
> 
>> On 2. Mar 2023, at 12:24, Stefan Kral <stefan.k...@emlix.com> wrote:
>>
>> Hi,
>>
>> I can give you some background context: our CouchDB instance is running
>> on a embedded device (with minimal attack vector, so we have no pressure
>> to mitigate CVEs). CouchDB has been chosen because of its write append
>> and power fail safe property (and because of the easy scriptable
>> curl/json interface).
>>
>> Currently there is a production system running on a SMB1 share (mounted
>> in a Linux host) which works well (at least for our uses cases). SMB1 is
>> not logner the default on the Windows remote side. And SMB2/3 has an
>> issue with opening a renamend but not closed filedescriptor. The
>> question is, wether we can solve this issue with minimal changes.
>>
>>> 1. How did you verify that the gen_server:call/3 call never returns?
>>> 2. Do you get any pertinent lines (especially crashes) in your
>>>   couch.log?
>>
>> by adding:
>>
>>> +        ?LOG_DEBUG("before gen_server:call", []),
>>>         ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, 
>>> infinity),
>>> +        ?LOG_DEBUG("after gen_server:call", []),
>>
>> the log gives:
>>
>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.391.0>] Compaction process 
>>> spawned for db "asdf"
>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.84.0>] New task status for 
>>> <0.391.0>: [{changes_done,1},
>>>                                                   {database,<<"asdf">>},
>>>                                                   {progress,100},
>>>                                                   {started_on,1677753384},
>>>                                                   {total_changes,1},
>>>                                                   
>>> {type,database_compaction},
>>>                                                   {updated_on,1677753384}]
>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] CouchDB swapping files 
>>> .../asdf.couch and .../asdf.couch.compact.
>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] before gen_server:call
>>
>> then long time nothing...
>>
>> refreshing the db in the futon web gui gives: no response
>>
>> and the log continues with:
>>
>>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] ** Generic server 
>>> couch_compaction_daemon terminating
>>> ** Last message in was {'EXIT',<0.145.0>,
>>>                           {timeout,
>>>                               {gen_server,call,[couch_server,get_server]}}}
>>> ** When Server state == {state,<0.145.0>}
>>> ** Reason for termination ==
>>> ** {compaction_loop_died,
>>>       {timeout,{gen_server,call,[couch_server,get_server]}}}
>>>
>>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] {error_report,<0.31.0>,
>>>                     {<0.144.0>,crash_report,
>>>                      [[{initial_call,
>>>                         {couch_compaction_daemon,init,['Argument__1']}},
>>>                        {pid,<0.144.0>},
>>>                        {registered_name,couch_compaction_daemon},
>>>                        {error_info,
>>>                         {exit,
>>>                          {compaction_loop_died,
>>>                           {timeout,
>>>                            {gen_server,call,[couch_server,get_server]}}},
>>>                          [{gen_server,terminate,7,
>>>                            [{file,"gen_server.erl"},{line,804}]},
>>>                           {proc_lib,init_p_do_apply,3,
>>>                            [{file,"proc_lib.erl"},{line,237}]}]}},
>> ...
>>
>>
>>> 3. Can you share your environment where you get to compile 1.6.1
>>>   successfully, so we can try and reproduce this?
>>
>> I could prepare you a yocto setup to build a toolchain and packages for
>> an qemu/docker imgage, if you are familar with that build system...
>>
>>> 4. Could it be that your SMB implementation doesn’t allow for opening
>>> and closing files in this quick succession (with our without a rename
>>> in the mix)?
>>
>> For testing it desn't need to run on SMB share, the timeout issue
>> occures with the given fd-swap patch on a default (Linux) setup.
>>
>> And a strace log does not show any underlying FS issues.
>>
>>
>> Best,
>> Stefan
>>
>> Am 28.02.23 um 16:47 schrieb Jan Lehnardt:
>>> first off, CouchDB 1.6.1 is no longer supported by this project AND it
>>> has a long list of CVEs[1] against it. You REALLY should be operating
>>> on a newer version.
>>>
>>> Secondly, just to understand your motivation: you think closing and
>>> opening the fds after the file:rename/2 call will make things work
>>> for your SMB operation?
>>>
>>> If yes, the only think I could spot that is substantially different, is
>>> that the NewFd position is advanced implicitly by the underlying
>>> file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment,
>>> but I don’t know why that should block the gen server call, as that only
>>> does some refcounting updates[3]. While this includes stopping the
>>> gen_server[4], I don’t see how the Pid this operates on should be any
>>> different under your patch.
>>>
>>> So:
>>>
>>> 1. How did you verify that the gen_server:call/3 call never returns?
>>> 2. Do you get any pertinent lines (especially crashes) in your couch.log?
>>> 3. Can you share your environment where you get to compile 1.6.1
>>>   successfully, so we can try and reproduce this?
>>> 4. Could it be that your SMB implementation doesn’t allow for opening and
>>>   closing files in this quick succession (with our without a rename in
>>>   the mix)?
>>>
>>>
>>> [1]: https://docs.couchdb.org/en/stable/cve/index.html
>>> [2]: 
>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179
>>> [3]: 
>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130
>>> [4]: 
>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84
>>>
>>>
>>> Best
>>> Jan
>>> — 
>>> Professional Support for Apache CouchDB:
>>> https://neighbourhood.ie/couchdb-support/
>>>
>>> 24/7 Observation for your CouchDB Instances:
>>> https://opservatory.app
>>>
>>>
>>>> On 28. Feb 2023, at 10:19, Stefan Kral <stefan.k...@emlix.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm experimenting with a CouchDB setup on a SMB mount point. I know this
>>>> is not supported, but I ran into a (maybe simple) problem I don't
>>>> understand. Maybe someone of you can give a hint easily (that would be
>>>> amazing).
>>>>
>>>> Given the following patch (I need to close/reopen the file descriptors
>>>> after renaming) for the function
>>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176
>>>>
>>>>> 1 --- a/src/couchdb/couch_db_updater.erl
>>>>> 2 +++ b/src/couchdb/couch_db_updater.erl
>>>>> 3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, 
>>>>> _From, #db{filepath=Path}=Db) ->
>>>>> 4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>>>>> 5          couch_file:delete(RootDir, Filepath),
>>>>> 6          ok = file:rename(CompactFilepath, Filepath),
>>>>> 7 +
>>>>> 8 +        ok = couch_file:close(NewDb#db.updater_fd),
>>>>> 9 +        ok = couch_file:close(NewDb#db.fd),
>>>>> 10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>>>>> 11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>>>>> 12 +        SwappedDb = NewDb2#db{
>>>>> 13 +            fd = SwappedReaderFd,
>>>>> 14 +            updater_fd = SwappedFd
>>>>> 15 +        },
>>>>> 16 +        unlink(SwappedFd),
>>>>> 17          close_db(Db),
>>>>> 18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>>>>> 19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>>>>> 20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, 
>>>>> infinity),
>>>>> 21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>>>>> 22          ?LOG_INFO("Compaction for db \"~s\" completed.", 
>>>>> [Db#db.name]),
>>>>
>>>> then the gen_server:call() of line 20 never returns.
>>>>
>>>> Is there a major issue with this approach or just a minor mistake in my
>>>> implementation?
>>>>
>>>>
>>>> Thank you for having a look,
>>>> Stefan
>>>
>>>
> 

-- 
Besuchen Sie uns auf der Embedded World 2023
14. bis 16. März 2023 | Messe Nürnberg
Sie finden uns in Halle 4, Stand 336

Dipl.-Ing. Stefan Kral, emlix GmbH, http://www.emlix.com
Fon +49 30 275911-00, Fax -33
Panoramastraße 1, 10178 Berlin, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
Ust.-IdNr.: DE 205 198 055

emlix - smart embedded open source

Reply via email to