Hi, On 2021-08-04 23:44:10 +0200, Tomas Vondra wrote: > How is that better than the two varint flavors that are already out there, > i.e. the bitcoin [1] and protocol buffers [2]?
The protobuf one is *terrible* for CPU efficiency. You need to go through each byte, do masking and shifting for each byte and then have a conditional branch. That's bad from the the amount of instructions you need to execute, and *really* bad for the branch predictor. > The first one seems quite efficient in how it encodes the length into very > few bits (which matters especially for small values). It's designed for > integers with 1B, 2B, 4B or 8B, but it can be extended to arbitrary lengths > fairly easily, I think: > Look at the first byte, and > > 0 - 243 - encoded as is > 244 - 1 byte > 245 - 2 bytes > 246 - 3 bytes > 247 - 4 bytes > 248 - 5 bytes > 249 - 6 bytes > 250 - 7 bytes > 251 - 8 bytes > 252 - next 1 byte is length > 253 - next 2 bytes are length > 254 - next 3 bytes are length > 255 - next 4 bytes are length > If we want to support longer lengths, we'd have to reserve an extra value > (which reduces the number of values that require a single byte). I think that's not a bad scheme. I think it may end up being a bit more expensive to decode because you need more branches instead of using find-first-set type instructions. Greetings, Andres Freund