[ 
https://issues.apache.org/jira/browse/COUCHDB-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885283#comment-13885283
 ] 

Igor Klimer commented on COUCHDB-2040:
--------------------------------------

{quote}
The original database has 130.2 GB [...], the replicated one 121.9 GB.
{quote}
So, around 9GB gain... not much, was counting for more, but we rarely delete 
anything so, that should have been expected :)

As for the difference in number of documents - unfortunately, I had to delete 
the replicated database (was taking too much space and it was a test anyway), 
so I can't tell for sure now, but now that I think about it the difference was 
because I checked the number of documents during the day, while the replication 
took place during the night - they must have added a few documents to the 
master database in the meantime. Sorry for the false alarm.

While I agree that this was probably a random bit flip and not something wrong 
with couchdb, I'd very much like to see some improvement to the compaction 
process. Replication handled the corrupted attachment properly (in my opinion) 
- by issuing an error to the log, but continuing with the process as a whole. 
On the other hand, the compaction failed with incomprehensible error stack 
trace and aborted the whole process. I think it would be a good idea before 
closing this bug to improve the error handling in compaction as it is done in 
replication (at least for the mismatched md5 checksum case). Of course, that's 
just a suggestion - maybe compaction is meant to be stricter than replication 
and should fail on such cases (like a bad md5 checksum).

PS: thank you very much, Robert, for helping with this case.

> Compaction fails when copying attachment
> ----------------------------------------
>
>                 Key: COUCHDB-2040
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2040
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Igor Klimer
>
> Orignal discussion from the user mailing list: 
> http://mail-archives.apache.org/mod_mbox/couchdb-user/201401.mbox/%3cd14f971a540b974bb75adc55f00f34ca69a35...@sex1.getback.ad2008r2.corp%3e
> Digest:
> During database compaction, the process fails at about 50% with the following 
> error: http://pastebin.com/qeaZNHMj (CouchDB 1.2.0, Windows Server 2008 R2 
> Enterprise).
> After server and CouchDB upgrade the error is still the same: 
> http://pastebin.com/feJWu7bN (CouchDB 1.5.0, Ubuntu 12.04.3 LTS (GNU/Linux 
> 3.8.0-33-generic x86_64)).
> There was one prior attempt at compaction that failed because of insufficient 
> disk space: http://pastebin.com/S1URXN0p
> After this initial failure, I've made sure that there's sufficient disk space 
> for the .compact file.
> The .compact file was always removed before trying compaction again.
> At the request of Robert Samuel Newson, I've also tried with an empty 
> .compact file - the results were the same: http://pastebin.com/MJCgGM8C.
> Our I/O subsystem consists of some RAID5 matrices - the admins claim that 
> they've been running error-free since inception ;) We have yet to run a 
> parity check, since that'd require taking the matrix offline and I'd rather 
> not do that without exhausting other options.
> Config files from the 1.2.0/Windows server (since that's where the fault must 
> have occured):
> default.ini: http://pastebin.com/kUz0qyNk
> local.ini: http://pastebin.com/srZUMwzB
> Other than the default delayed_commits set to true, there are no options that 
> could affect fsync()ing and such.
> I've run:
> curl localhost:5984/ecrepo/_changes?include_docs=true
> curl localhost:5984/ecrepo/_all_docs?include_docs=true
> and both calls succeeded, which would suggest that a faulty (incorrect 
> checksum/length) is at fault somewhere.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to