On Fri, Mar 11, 2011 at 6:56 AM, Thomas W <thomas.weh...@gmail.com> wrote:
> I`m thinking about creating a very simple revision system for photos > in python, something like bazaar, mercurial or git, but for photos. > The problem is that handling large binary files compared to plain text > files are quite different. Has anybody done something like this or > have any thoughts about it, I`d be very grateful. If something like > mercurial or git could be used and/or extended/customized that would > be even better. > > We are talking about large numbers of photos and some of them are > large in size as well, but the functionality does not have to be a > full fledged revision system, just handle checking out, checking in, > handling conflicts, rollbacks etc, preferrably without storing > complete copies of the files in question for every operation. > > Thanks for any input. :-) > Check out the rolling_checksum portion of backshift, and pyrabinf: http://stromberg.dnsalias.org/svn/backshift/trunk/ http://stromberg.dnsalias.org/svn/pyrabinf/ You could probably use a variable-length, shift-resistant blocking to chop the inputs into binary chunks, and then make a checkin consist of a series "pointers" (pathnames in a filesystem trie or keys into something like mongodb) to those chunks, to avoid duplications. Actually, something like this could probably be wrapped around Mercurial or SVN or whatever, depending on what your needs are. I originally set up pyrabinf as a wrapper for a preexisting C++ Rabin Fingerprinting algorithm; this is probably the more traditional way of doing such blocking. However, I've been playing around with rolling my own algorithm in pure python (and also with Cython) using something that boils down to a rolling (boxcar) sum of the bytes, so it'll work in pypy. So far, it seems to be working fine. Rabin Fingerprinting should be less subject to generating the same blocking for a file that has two adjacent bytes swapped, but in my project, and I suspect in yours, that doesn't really matter. But also check out http://mercurial.selenic.com/wiki/BfilesExtension - this might be less time consuming for you, better leveraging an existing tool. HTH
-- http://mail.python.org/mailman/listinfo/python-list