On 1/12/2012 7:24 AM, Peter Otten wrote:
Máté Koch wrote:
I'm developing an app which stores the data in file system database. The
data in my case consists of large python objects, mostly dicts, containing
texts and numbers. The easiest way to dump and load them would be pickle,
but I have a problem with it: I want to keep the data in version control,
and I would like to use it as efficiently as possible. Is it possible to
force pickle to store the otherwise unordered (e.g. dictionary) data in a
kind of ordered way, so that if I dump a large dict, then change 1 tiny
thing in it and dump again, the diff of the former and the new file will
be minimal?
If pickle is not the best choice for me, can you suggest anything else?
(If there isn't any solution for it so far, I will write the module of
course, but first I'd like to look around and make sure it hasn't been
created yet.)
Have you considered json?
http://docs.python.org/library/json.html
The encoder features a sort_keys flag which might help.
If that does not do it for you, consider that a dict is a two-column
table, with arbitrary structures in each column. Convert to list with
sorted(somedict.items()). This is basically what json should do. Then
write to a text stream, one line per key,value pair. Whether you put the
text into an os file in a directory (a hierachical database ;-) or a
text field in another database is up to you. Either way, diffs are easy.
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list