[
https://issues.apache.org/jira/browse/COUCHDB-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925259#comment-13925259
]
Isaac Z. Schlueter commented on COUCHDB-2102:
---------------------------------------------
[~rnewson] I don't think attachments are the issue here. The attachment-free
`_users` db was 30MB on the host machine, and it grew to 300MB on one
downstream replica, and 2GB on another.
I cannot give you the database file in that case, for obvious reasons. This
week, I will try to write up a standalone test case. My plan is this:
1. Start two couches, src and dest
2. set up continuous replication from src to dest
3. do a bunch of PUTs into src, using docs that match the data in our _users db
(sans password info)
4. Look at the resulting file sizes
> Downstream replicator database bloat
> ------------------------------------
>
> Key: COUCHDB-2102
> URL: https://issues.apache.org/jira/browse/COUCHDB-2102
> Project: CouchDB
> Issue Type: Bug
> Security Level: public(Regular issues)
> Components: Replication
> Reporter: Isaac Z. Schlueter
>
> When I do continuous replication from one db to another, I get a lot of bloat
> over time.
> For example, replicating a _users db with a relatively low level of writes,
> and around 30,000 documents, the size on disk of the downstream replica was
> over 300MB after 2 weeks. I compacted the DB, and the size dropped to about
> 20MB (slightly smaller than the source database).
> Of course, I realize that I can configure compaction to happen regularly.
> But this still seems like a rather excessive tax. It is especially shocking
> to users who are replicating a 100GB database full of attachments, and find
> it grow to 400GB if they're not careful! You can easily end up in a
> situation where you don't have enough disk space to successfully compact.
> Is there a fundamental reason why this happens? Or has it simply never been
> a priority? It'd be awesome if replication were more efficient with disk
> space.
--
This message was sent by Atlassian JIRA
(v6.2#6252)