[ https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843018#comment-13843018 ]
Dave Cottlehuber commented on COUCHDB-1946: ------------------------------------------- [~stelcheck] agreed [~thor.lange] There's something with replicating this specific doc that seems to trigger issues. Here's what I used to identify it (call source db and use since= <checkpoint -1) http://isaacs.iriscouch.com/registry/_changes\?limit\=2\&since\=701251 here's some things you can try: # option 1 - delete all existing replications - compact your DB if there's a big difference between data size and on-disk size. jq is awesome for this. curl -s http://localhost:5984/registry | jq ' (.disk_size| tonumber) - (.data_size |tonumber)' http://stedolan.github.io/jq/ This is a good spot to copy the registry.couch file if you have space, in case you need to revert back to it. - replicate the single failing document by POSTing this to _replicator. This could take a *while*. {{code}} { "source": "http://isaacs.iriscouch.com/registry", "target": "registry", "doc_ids": [ "as-stream" ], "owner": "admin", } } {{code}} - this is simply replicating the single stuck document. If you do this, I would love an ngrep or tcpdump of the traffic to see what happens on the wire during these stuck transfers - once this is completed, you can then run the normal replication again. # option 2 Install an older release of CouchDB and see if it doesn't get stuck here: https://archive.apache.org/dist/couchdb/binary/win/1.2.2/ If you *can* please try the R15B03-1 release first, report back, and then the R14B04 one. It's not yet clear to me if the issue we are seeing is also related to garbage collection differences in Erlang/OTP between releases, or solely within CouchDB. # option 3 Sometime later (hopefully today), I should have a bitttorrent accessible version of npm. I need to update & compact first, this is pretty much IO limited :-). > Trying to replicate NPM grinds to a halt after 40GB > --------------------------------------------------- > > Key: COUCHDB-1946 > URL: https://issues.apache.org/jira/browse/COUCHDB-1946 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Reporter: Marc Trudel > Attachments: couch.log > > > I have been able to replicate the Node.js NPM database until 40G or so, then > I get this: > https://gist.github.com/stelcheck/7723362 > I one case I have gotten a flat-out OOM error, but I didn't take a dump of > the log output at the time. > CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to > restart replication from scratch - twice - bot cases stalling at 40GB. -- This message was sent by Atlassian JIRA (v6.1.4#6159)