Hi Stefan,

while we had a discussion at Slack [1] (found by Jan at [2]) about the 
atomicity of „rename", could it
be a similar problem here (Linux/qemu/fs stack)?

In [2] they could workaround their problem with waiting some time after 
renaming?

@Stefan, maybe you could try to wait some time after renaming/closing the db?

Cheers,
-Ronny

[1] https://couchdb.slack.com/archives/C01TBE2J197/p1678355980122119
[2] https://toot.cat/@zkat/109973167110793372

> Am 13.03.2023 um 09:50 schrieb Stefan Kral <stefan.k...@emlix.com>:
> 
> Hi Jan,
> 
> here you go: https://github.com/emlix/couchdb-yocto
> 
> the mentioned patch is here
> https://github.com/emlix/couchdb-yocto/blob/main/meta-couchdb/recipes-core/couchdb/files/0001-swap-fds.patch
> 
> when you run the comaction test (see README do get there)
> /usr/lib/test-couchdb/test-compaction.sh
> 
> you will find in the (/var/log/couchdb/couch.log) log as last line:
> [debug] [<0.173.0>] before gen_server:call
> 
> Thanks,
> Stefan
> 
> Am 02.03.23 um 13:45 schrieb Jan Lehnardt:
>> Hi Stefan,
>> 
>> Thanks for the additional info. I’m happy to try a yocto build here.
>> 
>> Best
>> Jan
>> —
>> 
>>> On 2. Mar 2023, at 12:24, Stefan Kral <stefan.k...@emlix.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I can give you some background context: our CouchDB instance is running
>>> on a embedded device (with minimal attack vector, so we have no pressure
>>> to mitigate CVEs). CouchDB has been chosen because of its write append
>>> and power fail safe property (and because of the easy scriptable
>>> curl/json interface).
>>> 
>>> Currently there is a production system running on a SMB1 share (mounted
>>> in a Linux host) which works well (at least for our uses cases). SMB1 is
>>> not logner the default on the Windows remote side. And SMB2/3 has an
>>> issue with opening a renamend but not closed filedescriptor. The
>>> question is, wether we can solve this issue with minimal changes.
>>> 
>>>> 1. How did you verify that the gen_server:call/3 call never returns?
>>>> 2. Do you get any pertinent lines (especially crashes) in your
>>>>  couch.log?
>>> 
>>> by adding:
>>> 
>>>> +        ?LOG_DEBUG("before gen_server:call", []),
>>>>        ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, 
>>>> infinity),
>>>> +        ?LOG_DEBUG("after gen_server:call", []),
>>> 
>>> the log gives:
>>> 
>>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.391.0>] Compaction process 
>>>> spawned for db "asdf"
>>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.84.0>] New task status for 
>>>> <0.391.0>: [{changes_done,1},
>>>>                                                  {database,<<"asdf">>},
>>>>                                                  {progress,100},
>>>>                                                  {started_on,1677753384},
>>>>                                                  {total_changes,1},
>>>>                                                  
>>>> {type,database_compaction},
>>>>                                                  {updated_on,1677753384}]
>>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] CouchDB swapping files 
>>>> .../asdf.couch and .../asdf.couch.compact.
>>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] before gen_server:call
>>> 
>>> then long time nothing...
>>> 
>>> refreshing the db in the futon web gui gives: no response
>>> 
>>> and the log continues with:
>>> 
>>>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] ** Generic server 
>>>> couch_compaction_daemon terminating
>>>> ** Last message in was {'EXIT',<0.145.0>,
>>>>                          {timeout,
>>>>                              {gen_server,call,[couch_server,get_server]}}}
>>>> ** When Server state == {state,<0.145.0>}
>>>> ** Reason for termination ==
>>>> ** {compaction_loop_died,
>>>>      {timeout,{gen_server,call,[couch_server,get_server]}}}
>>>> 
>>>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] {error_report,<0.31.0>,
>>>>                    {<0.144.0>,crash_report,
>>>>                     [[{initial_call,
>>>>                        {couch_compaction_daemon,init,['Argument__1']}},
>>>>                       {pid,<0.144.0>},
>>>>                       {registered_name,couch_compaction_daemon},
>>>>                       {error_info,
>>>>                        {exit,
>>>>                         {compaction_loop_died,
>>>>                          {timeout,
>>>>                           {gen_server,call,[couch_server,get_server]}}},
>>>>                         [{gen_server,terminate,7,
>>>>                           [{file,"gen_server.erl"},{line,804}]},
>>>>                          {proc_lib,init_p_do_apply,3,
>>>>                           [{file,"proc_lib.erl"},{line,237}]}]}},
>>> ...
>>> 
>>> 
>>>> 3. Can you share your environment where you get to compile 1.6.1
>>>>  successfully, so we can try and reproduce this?
>>> 
>>> I could prepare you a yocto setup to build a toolchain and packages for
>>> an qemu/docker imgage, if you are familar with that build system...
>>> 
>>>> 4. Could it be that your SMB implementation doesn’t allow for opening
>>>> and closing files in this quick succession (with our without a rename
>>>> in the mix)?
>>> 
>>> For testing it desn't need to run on SMB share, the timeout issue
>>> occures with the given fd-swap patch on a default (Linux) setup.
>>> 
>>> And a strace log does not show any underlying FS issues.
>>> 
>>> 
>>> Best,
>>> Stefan
>>> 
>>> Am 28.02.23 um 16:47 schrieb Jan Lehnardt:
>>>> first off, CouchDB 1.6.1 is no longer supported by this project AND it
>>>> has a long list of CVEs[1] against it. You REALLY should be operating
>>>> on a newer version.
>>>> 
>>>> Secondly, just to understand your motivation: you think closing and
>>>> opening the fds after the file:rename/2 call will make things work
>>>> for your SMB operation?
>>>> 
>>>> If yes, the only think I could spot that is substantially different, is
>>>> that the NewFd position is advanced implicitly by the underlying
>>>> file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment,
>>>> but I don’t know why that should block the gen server call, as that only
>>>> does some refcounting updates[3]. While this includes stopping the
>>>> gen_server[4], I don’t see how the Pid this operates on should be any
>>>> different under your patch.
>>>> 
>>>> So:
>>>> 
>>>> 1. How did you verify that the gen_server:call/3 call never returns?
>>>> 2. Do you get any pertinent lines (especially crashes) in your couch.log?
>>>> 3. Can you share your environment where you get to compile 1.6.1
>>>>  successfully, so we can try and reproduce this?
>>>> 4. Could it be that your SMB implementation doesn’t allow for opening and
>>>>  closing files in this quick succession (with our without a rename in
>>>>  the mix)?
>>>> 
>>>> 
>>>> [1]: https://docs.couchdb.org/en/stable/cve/index.html
>>>> [2]: 
>>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179
>>>> [3]: 
>>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130
>>>> [4]: 
>>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84
>>>> 
>>>> 
>>>> Best
>>>> Jan
>>>> — 
>>>> Professional Support for Apache CouchDB:
>>>> https://neighbourhood.ie/couchdb-support/
>>>> 
>>>> 24/7 Observation for your CouchDB Instances:
>>>> https://opservatory.app
>>>> 
>>>> 
>>>>> On 28. Feb 2023, at 10:19, Stefan Kral <stefan.k...@emlix.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm experimenting with a CouchDB setup on a SMB mount point. I know this
>>>>> is not supported, but I ran into a (maybe simple) problem I don't
>>>>> understand. Maybe someone of you can give a hint easily (that would be
>>>>> amazing).
>>>>> 
>>>>> Given the following patch (I need to close/reopen the file descriptors
>>>>> after renaming) for the function
>>>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176
>>>>> 
>>>>>> 1 --- a/src/couchdb/couch_db_updater.erl
>>>>>> 2 +++ b/src/couchdb/couch_db_updater.erl
>>>>>> 3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, 
>>>>>> _From, #db{filepath=Path}=Db) ->
>>>>>> 4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>>>>>> 5          couch_file:delete(RootDir, Filepath),
>>>>>> 6          ok = file:rename(CompactFilepath, Filepath),
>>>>>> 7 +
>>>>>> 8 +        ok = couch_file:close(NewDb#db.updater_fd),
>>>>>> 9 +        ok = couch_file:close(NewDb#db.fd),
>>>>>> 10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>>>>>> 11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>>>>>> 12 +        SwappedDb = NewDb2#db{
>>>>>> 13 +            fd = SwappedReaderFd,
>>>>>> 14 +            updater_fd = SwappedFd
>>>>>> 15 +        },
>>>>>> 16 +        unlink(SwappedFd),
>>>>>> 17          close_db(Db),
>>>>>> 18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>>>>>> 19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>>>>>> 20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, 
>>>>>> infinity),
>>>>>> 21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>>>>>> 22          ?LOG_INFO("Compaction for db \"~s\" completed.", 
>>>>>> [Db#db.name]),
>>>>> 
>>>>> then the gen_server:call() of line 20 never returns.
>>>>> 
>>>>> Is there a major issue with this approach or just a minor mistake in my
>>>>> implementation?
>>>>> 
>>>>> 
>>>>> Thank you for having a look,
>>>>> Stefan
>>>> 
>>>> 
>> 
> 
> -- 
> Besuchen Sie uns auf der Embedded World 2023
> 14. bis 16. März 2023 | Messe Nürnberg
> Sie finden uns in Halle 4, Stand 336
> 
> Dipl.-Ing. Stefan Kral, emlix GmbH, http://www.emlix.com
> Fon +49 30 275911-00, Fax -33
> Panoramastraße 1, 10178 Berlin, Germany
> Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
> Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
> Ust.-IdNr.: DE 205 198 055
> 
> emlix - smart embedded open source

Reply via email to