Subversion 1.8 and before in general uses human readable decimal format to store numbers in FSFS repositories on disk. Log addressing implementation on trunk introduces new encoding for storing numbers in indexes. Quoting log addressing indexes format documentation [1] [[[ Encoding --------
The final index file format is tuned for space and decoding efficiency. Indexes are stored as a sequence of variable integers. The encoding is as follows: * Unsigned integers are stored in little endian order with a variable length 7b/8b encoding. If most significant bit a byte has been set, the next byte has also belongs to the same value. 0x00 .. 0x7f -> 0x00 .. 0x7f ( 7 bits stored in 8 bits) 0x80 .. 0xff -> 0x80 0x01 .. 0xff 0x01 (14 bits stored in 16 bits) 0x100 .. 0x3fff -> 0x80 0x02 .. 0xff 0x7f (14 bits stored in 16 bits) 0x100000000 -> 0x80 0x80 0x80 0x80 0x10 (35 bits stored in 40 bits) Technically, we can represent integers of arbitrary lengths. Currently, we only generate and parse up to 64 bits. * Signed integers are mapped onto the unsigned value space as follows: x >= 0 -> 2 * x x < 0 -> -2 * x - 1 Again, we can represent arbitrary length numbers that way but the code is currently restricted to 64 bits. Most data is unsigned by nature but will be stored differentially using signed integers. ]]] I'm unhappy with choosen encoding since it's not human readable. Also it is not so good for performance as storing 8 bytes for every number. I think indexes should use one of the following format: 1. Use human readable decimal numbers with trailing newline: this will be consistent with original FSFS encoding and easier to investigate corruptions. 2. Just store 64-bit numbers as 8-byte in some fixed endianess (little endian for example). This will give us maximum performance since we get fixed length index records. While they still be somewhat human readable using HEX editors. The current encoding is unacceptable, because it makes repository maintenance and recovery nearly impossible. [1] http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure-indexes -- Ivan Zhakov CTO | VisualSVN | http://www.visualsvn.com