On 01/04/2012 03:35 PM, Stefan Hajnoczi wrote: > > I think we should specialize this for out case, where we expect runs of > > zeros (so no need to encode the repeated byte) and runs of non-repeating > > nonzeros. I propose this encoding: > > > > page = zrun > > | zrun nzrun > > | zrun nzrun page > > > > zrun = length > > > > nzrun = length byte... > > > > length = uleb128 encoded integer > > > > take for example a xor-encoded page: > > > > { 1000*0, 1, 2, 3, 4, 3092*0 } > > > > representing a page that had a single 32-bit write in the middle. The > > current encoding would generate > > > > ff 00 ff 00 ff 00 eb 00 01 01 01 02 01 03 01 04 ff 00 ff 00 ff 00 ff > > 00 ff 00 ff 00ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 20 00 > > > > while the zrle encoding generates > > > > e8 07 04 01 02 03 04 94 18 > > > > (e8 07 = uleb128 encoding for 1000) > > Had to look up the Unsigned Little-Endian Base 128 encoding, but it's > a nice idea and simple to implement (though we probably want to > constrain ULEB128 to max 32-bit or 64-bit in practice; we don't want > arbitrarily long integers). > > http://en.wikipedia.org/wiki/LEB128 >
Sorry, should have provided the URL. And yes, for this use we're limited to TARGET_PAGE_BITS, so having 13-bit encoders/decoders would suffice: int uleb128_encode_small(uint8_t *out, uint32_t n) { if (n < 0x80) { *out++ = n; return 1; } else { *out++ = (n & 0x7f) | 0x80; *out++ = n >> 7; } } int uleb128_decode_small(const uint128 *in, uint32_t *n) { if (!(*in & 0x80)) { *n = *in++; return 1; } else { *n = *in++ & 0x7f; *n |= *in++ << 7; return 2; } } -- error compiling committee.c: too many arguments to function