[ 
https://issues.apache.org/jira/browse/COUCHDB-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843018#comment-13843018
 ] 

Dave Cottlehuber commented on COUCHDB-1946:
-------------------------------------------

[~stelcheck] agreed
[~thor.lange]

There's something with replicating this specific doc that seems to trigger 
issues. Here's what I used to identify it (call source db and use since= 
<checkpoint -1)

    http://isaacs.iriscouch.com/registry/_changes\?limit\=2\&since\=701251

here's some things you can try:

# option 1

-  delete all existing replications
- compact your DB if there's a big difference between data size and on-disk 
size. jq is awesome for this.

curl -s http://localhost:5984/registry | jq ' (.disk_size| tonumber) - 
(.data_size |tonumber)'

    http://stedolan.github.io/jq/

This is a good spot to copy the registry.couch file if you have space, in case 
you need to revert back to it.

-  replicate the single failing document by POSTing this to _replicator. This 
could take a *while*.

{{code}}
{
   "source": "http://isaacs.iriscouch.com/registry";,
   "target": "registry",
   "doc_ids": [
       "as-stream"
   ],
   "owner": "admin",
   }
}
{{code}}

- this is simply replicating the single stuck document. If you do this, I would 
love an ngrep or tcpdump of the traffic to see what happens on the wire during 
these stuck transfers

- once this is completed, you can then run the normal replication again.

# option 2

Install an older release of CouchDB and see if it doesn't get stuck here:

https://archive.apache.org/dist/couchdb/binary/win/1.2.2/

If you *can* please try the R15B03-1 release first, report back, and then the 
R14B04 one. It's not yet clear to me if the issue we are seeing is also related 
to garbage collection differences in Erlang/OTP between releases, or solely 
within CouchDB.

# option 3

Sometime later (hopefully today), I should have a bitttorrent accessible 
version of npm. I need to update & compact first, this is pretty much IO 
limited :-).


> Trying to replicate NPM grinds to a halt after 40GB
> ---------------------------------------------------
>
>                 Key: COUCHDB-1946
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1946
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Marc Trudel
>         Attachments: couch.log
>
>
> I have been able to replicate the Node.js NPM database until 40G or so, then 
> I get this:
> https://gist.github.com/stelcheck/7723362
> I one case I have gotten a flat-out OOM error, but I didn't take a dump of 
> the log output at the time.
> CentOS6.4 with CouchDB 1.5 (also tried 1.3.1, but to no avail). Also tried to 
> restart replication from scratch - twice - bot cases stalling at 40GB.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to