On Wed, Jan 4, 2012 at 12:59 PM, Avi Kivity <a...@redhat.com> wrote: > On 01/03/2012 05:34 PM, Orit Wasserman wrote: >> >> +/* XBRLE (Xor Based Run-Length Encoding) */ >> +static int rle_encode(uint8_t *src, int slen, uint8_t *dst, int dlen) >> +{ >> + int d = 0, ch_run = 0, i; >> + uint8_t prev = 0, ch = 0; >> + >> + for (i = 0; i <= slen; i++) { >> + if (i != slen) { >> + ch = src[i]; >> + } >> + >> + if (!i || (i != slen && ch == prev && ch_run < 255)) { >> + ch_run++; >> + } else { >> + if (d+2 > dlen) { >> + return -1; >> + } >> + *dst++ = ch_run; >> + *dst++ = prev; >> + d += 2; >> + ch_run = 1; >> + } >> + >> + prev = ch; >> + } >> + return d; >> +} >> + > > I think we should specialize this for out case, where we expect runs of > zeros (so no need to encode the repeated byte) and runs of non-repeating > nonzeros. I propose this encoding: > > page = zrun > | zrun nzrun > | zrun nzrun page > > zrun = length > > nzrun = length byte... > > length = uleb128 encoded integer > > take for example a xor-encoded page: > > { 1000*0, 1, 2, 3, 4, 3092*0 } > > representing a page that had a single 32-bit write in the middle. The > current encoding would generate > > ff 00 ff 00 ff 00 eb 00 01 01 01 02 01 03 01 04 ff 00 ff 00 ff 00 ff > 00 ff 00 ff 00ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 20 00 > > while the zrle encoding generates > > e8 07 04 01 02 03 04 94 18 > > (e8 07 = uleb128 encoding for 1000)
Had to look up the Unsigned Little-Endian Base 128 encoding, but it's a nice idea and simple to implement (though we probably want to constrain ULEB128 to max 32-bit or 64-bit in practice; we don't want arbitrarily long integers). http://en.wikipedia.org/wiki/LEB128 Stefan