This is interesting work, I notice some substantial changes to couch_btree, a 
new query_modify_raw, etc.. 

I'm wondering though if we'd be better off to base these changes on the re 
factored version of couch_btree that davisp has[1]. I haven't looked at it too 
closely or tested with it but if I recall the goal was first to achieve
a more readable version with identical semantics so that we could then move 
forward with improvements.


[1] 
https://github.com/davisp/couchdb/commit/37c1c9b4b90f6c0f3c22b75dfb2ae55c8b708ab1




On Jun 24, 2011, at 6:06 AM, Filipe David Manana wrote:

> Thanks Adam.
> 
> Don't get too scared :) Ignore the commit history and just look at
> github's "Files changed" tab, the modification summary is:
> 
> "Showing 19 changed files with 730 additions and 402 deletions."
> 
> More than half of those commits were merges with trunk, many snappy
> refactorings (before it was added to trunk) and other experiments that
> were reverted after.
> We'll try to break this into 2 or 3 patches.
> 
> So the single patch is something relatively small:
> https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test.diff
> 
> On Fri, Jun 24, 2011 at 4:05 AM, Adam Kocoloski <[email protected]> wrote:
>> Hi Damien, I'd like to see these 220 commits rebased into a set of logical 
>> patches against trunk.  It'll make the review easier and will help future 
>> devs track down any bugs that are introduced.  Best,
>> 
>> Adam
>> 
>> On Jun 23, 2011, at 6:49 PM, Damien Katz wrote:
>> 
>>> Hi everyone,
>>> 
>>> As it’s known by many of you, Filipe and I have been working on improving 
>>> performance, specially write performance [1]. This work has been public in 
>>> the Couchbase github account since the beginning, and the non Couchbase 
>>> specific changes are now isolated in [2] and [3].
>>> In [3] there’s an Erlang module that is used to test the performance when 
>>> writing and updating batches of documents with concurrency, which was used, 
>>> amongst other tools, to measure the performance gains. This module bypasses 
>>> the network stack and the JSON parsing, so that basically it allows us to 
>>> see more easily how significant the changes in couch_file, couch_db and 
>>> couch_db_updater are.
>>> 
>>> The main and most important change is asynchronous writes. The file module 
>>> no longer blocks callers until the write calls complete. Instead they 
>>> immediately reply to the caller with the position in the file where the 
>>> data is going to be written to. The data is then sent to a dedicated loop 
>>> process that is continuously writing the data it receives, from the 
>>> couch_file gen_server, to disk (and batching when possible). This allows 
>>> callers (such as the db updater for.e.g.) to issue write calls and keep 
>>> doing other work (preparing documents, etc) while the writes are being done 
>>> in parallel. After issuing all the writes, callers simply call the new 
>>> ‘flush’ function in the couch_file gen_server, which will block the caller 
>>> until everything was effectively written to disk - normally this flush call 
>>> ends up not blocking the caller or it blocks it for a very small period.
>>> 
>>> There are other changes such as avoiding 2 btree lookups per document ID 
>>> (COUCHDB-1084 [4]), faster sorting in the updater (O(n log n) vs O(n^2)) 
>>> and avoid sorting already sorted lists in the updater.
>>> 
>>> Checking if attachments are compressible was also moved into a new 
>>> module/process. We verified this took much CPU time when all or most of the 
>>> documents to write/update have attachments - building the regexps and 
>>> matching against them for every single attachment is surprisingly expensive.
>>> 
>>> There’s also a new couch_db:update_doc/s flag named ‘optimistic’ which 
>>> basically changes the behaviour to write the document bodies before 
>>> entering the updater and skip some attachment related checks (duplicated 
>>> names for e.g.). This flag is not yet exposed to the HTTP api, but it could 
>>> be via an X-Optimistic-Write header in the doc PUT/POST requests and 
>>> _bulk_docs for e.g. We’ve seen this as good when the client knows that the 
>>> documents to write don’t exist yet in the database and we aren’t already IO 
>>> bound, such as when SSDs are used.
>>> 
>>> We used relaximation, Filipe’s basho bench based tests [5] and the Erlang 
>>> test module mentioned before [6, 7], exposed via the HTTP . Here follow 
>>> some benchmark results.
>>> 
>>> 
>>> # Using the Erlang test module (test output)
>>> 
>>> ## 1Kb documents, 10 concurrent writers, batches of 500 docs
>>> 
>>> trunk before snappy was added:
>>> 
>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":270071}
>>> 
>>> trunk:
>>> 
>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":157328}
>>> 
>>> trunk + async writes (and snappy):
>>> 
>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":121518}
>>> 
>>> ## 2.5Kb documents, 10 concurrent writers, batches of 500 docs
>>> 
>>> trunk before snappy was added:
>>> 
>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":507098}
>>> 
>>> trunk:
>>> 
>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":230391}
>>> 
>>> trunk + async writes (and snappy):
>>> 
>>> {"db":"load_test","total":100000,"batch":500,"concurrency":10,"rounds":10,"delayed_commits":false,"optimistic":false,"total_time_ms":190151}
>>> 
>>> 
>>> # bash bench tests, via the public HTTP APIs
>>> 
>>> ## batches of 1 1Kb docs, 50 writers, 5 minutes run
>>> 
>>> trunk:     147 702 docs written
>>> branch:  149 534 docs written
>>> 
>>> ## batches of 10 1Kb docs, 50 writers, 5 minutes run
>>> 
>>> trunk:     878 520 docs written
>>> branch:  991 330 docs written
>>> 
>>> ## batches of 100 1Kb docs, 50 writers, 5 minutes run
>>> 
>>> trunk:    1 627 600 docs written
>>> branch: 1 865 800 docs written
>>> 
>>> ## batches of 1 2.5Kb docs, 50 writers, 5 minutes run
>>> 
>>> trunk:    142 531 docs written
>>> branch: 143 012 docs written
>>> 
>>> ## batches of 10 2.5Kb docs, 50 writers, 5 minutes run
>>> 
>>> trunk:     724 880 docs written
>>> branch:   780 690 docs written
>>> 
>>> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run
>>> 
>>> trunk:      1 028 600 docs written
>>> branch:   1 152 800 docs written
>>> 
>>> 
>>> # bash bench tests, via the internal Erlang APIs
>>> ## batches of 100 2.5Kb docs, 50 writers, 5 minutes run
>>> 
>>> trunk:    3 170 100 docs written
>>> branch: 3 359 900 docs written
>>> 
>>> 
>>> # Relaximation tests
>>> 
>>> 1Kb docs:
>>> 
>>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b83002a1a
>>> 
>>> 2.5Kb docs:
>>> 
>>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b830022c0
>>> 
>>> 4Kb docs:
>>> 
>>> http://graphs.mikeal.couchone.com/#/graph/4843dbdf8fa104783870094b8300330d
>>> 
>>> 
>>> All the documents used for these tests can be found at:  
>>> https://github.com/fdmanana/basho_bench_couch/tree/master/couch_docs
>>> 
>>> 
>>> Now some view indexing tests.
>>> 
>>> # indexer_test_2 database 
>>> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_2)
>>> 
>>> ## trunk
>>> 
>>> $ time curl 
>>> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1
>>> {"total_rows":1102400,"offset":0,"rows":[
>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
>>> ]}
>>> 
>>> real  20m51.388s
>>> user  0m0.040s
>>> sys   0m0.000s
>>> 
>>> 
>>> ## branch async writes
>>> 
>>> $ time curl 
>>> http://localhost:5984/indexer_test_2/_design/test/_view/view1?limit=1
>>> {"total_rows":1102400,"offset":0,"rows":[
>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
>>> ]}
>>> 
>>> real  15m17.908s
>>> user  0m0.008s
>>> sys   0m0.020s
>>> 
>>> 
>>> # indexer_test_3_database 
>>> (http://fdmanana.couchone.com/_utils/database.html?indexer_test_3)
>>> 
>>> ## trunk
>>> 
>>> $ time curl 
>>> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1
>>> {"total_rows":1102400,"offset":0,"rows":[
>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
>>> ]}
>>> 
>>> real  21m17.346s
>>> user  0m0.012s
>>> sys   0m0.028s
>>> 
>>> ## branch async writes
>>> 
>>> $ time curl 
>>> http://localhost:5984/indexer_test_3/_design/test/_view/view1?limit=1
>>> {"total_rows":1102400,"offset":0,"rows":[
>>> {"id":"00d49881-7bcf-4c3d-a65d-e44435eeb513","key":["dwarf","assassin",2,1.1],"value":[{"x":174347.18,"y":127272.8},{"x":35179.93,"y":41550.55},{"x":157014.38,"y":172052.63},{"x":116185.83,"y":69871.73},{"x":153746.28,"y":190006.59}]}
>>> ]}
>>> 
>>> real  16m28.558s
>>> user  0m0.012s
>>> sys   0m0.020s
>>> 
>>> We don’t show nearly as big of improvements for single write per request 
>>> benchmarks as we do with bulk writes. This is due to the HTTP request 
>>> overhead and our own inefficiencies at that layer. We have lots of room yet 
>>> for optimizations at the networking layer.
>>> 
>>> We'd like to merge this code into trunk next week by next wednesday. Please 
>>> respond with any improvement, objections or comments by then. Thanks!
>>> 
>>> -Damien
>>> 
>>> 
>>> [1] - 
>>> http://blog.couchbase.com/driving-performance-improvements-couchbase-single-server-two-dot-zero
>>> [2] - https://github.com/fdmanana/couchdb/compare/async_file_writes_no_test
>>> [3] - https://github.com/fdmanana/couchdb/compare/async_file_writes
>>> [4] - https://issues.apache.org/jira/browse/COUCHDB-1084
>>> [5] - https://github.com/fdmanana/basho_bench_couch
>>> [6] - https://github.com/fdmanana/couchdb/blob/async_file_writes/gen_load.sh
>>> [7] - 
>>> https://github.com/fdmanana/couchdb/blob/async_file_writes/src/couchdb/couch_internal_load_gen.erl
>> 
>> 
> 
> 
> 
> -- 
> Filipe David Manana,
> [email protected], [email protected]
> 
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."

Reply via email to