[GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...

2015-01-07 Thread davisp
Github user davisp commented on the pull request: https://github.com/apache/couchdb-couch/pull/24#issuecomment-69077741 For posterity, we were trying to make this work without requiring an on-disk format change but sadly that appears to not be possible due to the lack of information a

[GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...

2015-01-07 Thread iilyak
Github user iilyak closed the pull request at: https://github.com/apache/couchdb-couch/pull/24 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature i

[GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...

2015-01-07 Thread iilyak
Github user iilyak commented on the pull request: https://github.com/apache/couchdb-couch/pull/24#issuecomment-69067896 There are problems with calculating active size when document get updated or deleted. --- If your project is set up for it, you can reply to this email and have you

[GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...

2014-12-31 Thread rnewson
Github user rnewson commented on the pull request: https://github.com/apache/couchdb-couch/pull/24#issuecomment-68453103 Use the md5 digest as a strong hint that we've seen this content before, but verify for byte identity (read both attachments, compare exhaustively), and obviously t

[GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...

2014-12-31 Thread rnewson
Github user rnewson commented on the pull request: https://github.com/apache/couchdb-couch/pull/24#issuecomment-68452362 I don't think deduping on MD5 is sufficiently robust for a database. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...

2014-12-19 Thread iilyak
Github user iilyak commented on the pull request: https://github.com/apache/couchdb-couch/pull/24#issuecomment-67671232 Last push just updates commit messages to reference bug number. In addition to 3 new commits which implement a test case suggested by @davisp. --- If your project i

[GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...

2014-12-16 Thread davisp
Github user davisp commented on the pull request: https://github.com/apache/couchdb-couch/pull/24#issuecomment-67239293 That looks pretty good but there's still an issue with the counting of active size. Unfortunately I think the fix is going to require us to start using @strmpnk's ne

[GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...

2014-12-16 Thread davisp
Github user davisp commented on a diff in the pull request: https://github.com/apache/couchdb-couch/pull/24#discussion_r21931536 --- Diff: src/couch_db_updater.erl --- @@ -676,29 +676,43 @@ flush_trees(#db{fd = Fd} = Db, _ -> {Value, SizesAcc}

[GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...

2014-12-15 Thread iilyak
GitHub user iilyak opened a pull request: https://github.com/apache/couchdb-couch/pull/24 2516 deduplicate attachements on compaction This is replacing PR https://github.com/apache/couchdb-couch/pull/22 You can merge this pull request into a Git repository by running: $ git pul