Here is a test case for which your function still doesn't work:
fmt.Println(Uvarint([]byte{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00})) // should return 18446744073709551360 (=0xffffffffffffff00), 9
If you have read 8 bytes with the top bit set, then you must
*unconditionally* consume all 8 bits of the 9th byte, regardless of its
value.
On Saturday, 4 October 2025 at 10:42:59 UTC+1 R. Men wrote:
> Hi Brian,
>
> Yes, it seems I'll have to go the custom function route, given their
> non-standard encoding. Thanks for confirming, and really appreciate those
> tests. I fixed my code to handle >2 byte ints and special case for the 9th
> byte (for which SQLite encoding treats all bits as data). Leaving here in
> case anyone else is interested. Have a good weekend!
>
> package main
>
> import "fmt"
>
> const MaxVarintLen64 = 9
>
> func Uvarint(buf []byte) (uint64, int) {
> var x uint64
> var s uint = 7
> for i, b := range buf {
> if i == MaxVarintLen64 {
> // Catch byte reads past MaxVarintLen64.
> // See issue https://golang.org/issues/41185
> return 0, -(i + 1) // overflow
> }
> if i == MaxVarintLen64-1 && b > 1 {
> x <<= s + 1
> return x | uint64(b), i + 1
> }
>
> if b < 0x80 {
> x <<= s
> return x | uint64(b), i + 1
> }
> x <<= s
> x |= uint64(b & 0x7f)
> }
> return 0, 0
> }
>
> func main() {
> fmt.Println(Uvarint([]byte{0x81, 0x47}))
> // should return 199, 2
> fmt.Println(Uvarint([]byte{0xff, 0xff, 0x7f}))
> // should return 2097151 (=0x1fffff), 3
> fmt.Println(Uvarint([]byte{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> 0x7f})) // should return 72057594037927935 (=0xffffffffffffff), 8
> fmt.Println(Uvarint([]byte{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> 0xff})) // should return 18446744073709551615 (=0xffffffffffffffff), 9
> }
>
> On Saturday, October 4, 2025 at 9:45:26 AM UTC+2 Brian Candler wrote:
>
>> So in short, you are saying that the byte sequence 0x81, 0x47 written by
>> SQLite decodes by binary.Uvarint to 9089, but you wanted it to decode to
>> 199.
>>
>> What this means is: the encoding that SQLite has chosen to use is *not*
>> the varint as defined by protobuf (and implemented by the Go standard
>> library). And therefore, you do indeed need to write your own custom
>> decoding function.
>>
>> The SQLite file format is defined here:
>> https://www.sqlite.org/fileformat.html
>>
>> *A variable-length integer or "varint" is a static Huffman encoding of
>> 64-bit twos-complement integers that uses less space for small positive
>> values. A varint is between 1 and 9 bytes in length. The varint consists of
>> either zero or more bytes which have the high-order bit set followed by a
>> single byte with the high-order bit clear, or nine bytes, whichever is
>> shorter. The lower seven bits of each of the first eight bytes and all 8
>> bits of the ninth byte are used to reconstruct the 64-bit twos-complement
>> integer. Varints are big-endian: bits taken from the earlier byte of the
>> varint are more significant than bits taken from the later bytes.*
>>
>> And for protobuf, see:
>> https://protobuf.dev/programming-guides/encoding/#varints
>>
>> On Saturday, 4 October 2025 at 01:31:25 UTC+1 R. Men wrote:
>>
>>> Sure, I'll share my code and what I'm trying to do. Thank you all for
>>> the help so far. My program reads the sql table's metadata to determine the
>>> type and length of each column in the table. These values are encoded as
>>> varint of unsigned bigendian integers. I already validated the expected
>>> values match the tables's actual data type/size.
>>>
>>> package main
>>>
>>> import (
>>> "encoding/binary"
>>> "fmt"
>>> )
>>>
>>> func main() {
>>> // SQLite format 3, sample DB file record header
>>> //Expected: 7 23 27 27 1 199
>>> // |-------| |-------| |-------| |-------|
>>> |------| |----------------|
>>> inputs := []byte{0x07, 0x17, 0x1b, 0x1b, 0x01, 0x81, 0x47}
>>> offset := 0
>>> for remaining := len(inputs); remaining > 0; {
>>> d, n := binary.Uvarint(inputs[offset:])
>>> if n <= 0 {
>>> break
>>> }
>>>
>>> remaining -= n
>>> offset += n
>>> fmt.Println(d, n)
>>>
>>> // Actual output
>>> // 7 1
>>> // 23 1
>>> // 27 1
>>> // 27 1
>>> // 1 1
>>> // 9089 2
>>> }
>>> }
>>>
>>> I now see why I get the 9089 figure after looking at Uvarint source code
>>> (
>>> https://cs.opensource.google/go/go/+/refs/tags/go1.25.1:src/encoding/binary/varint.go
>>> ):
>>>
>>> func Uvarint(buf []byte) (uint64, int) {
>>> var x uint64
>>> var s uint
>>> for i, b := range buf {
>>> if i == MaxVarintLen64 {
>>> // Catch byte reads past MaxVarintLen64.
>>> // See issue https://golang.org/issues/41185
>>> return 0, -(i + 1) // overflow
>>> }
>>> if b < 0x80 {
>>> if i == MaxVarintLen64-1 && b > 1 {
>>> return 0, -(i + 1) // overflow
>>> }
>>> return x | uint64(b)<<s, i + 1
>>> }
>>> x |= uint64(b&0x7f) << s
>>> s += 7
>>> }
>>> return 0, 0
>>> }
>>>
>>> Here I see the bits after the first byte are left-shifted by 7 before
>>> concatenating and left-padding.
>>> My solution so far has been to create custom uvarint function that
>>> performs the left-shift before the concat, preserving the byte order.
>>>
>>> func Uvarint(buf []byte) (uint64, int) {
>>> var x uint64
>>> var s uint
>>> for i, b := range buf {
>>> if i == MaxVarintLen64 {
>>> // Catch byte reads past MaxVarintLen64.
>>> // See issue https://golang.org/issues/41185
>>> return 0, -(i + 1) // overflow
>>> }
>>> if b < 0x80 {
>>> if i == MaxVarintLen64-1 && b > 1 {
>>> return 0, -(i + 1) // overflow
>>> }
>>> x <<= s
>>> return x | uint64(b), i + 1
>>> }
>>> x <<= s
>>> x |= uint64(b&0x7f)
>>> s += 7
>>> }
>>> return 0, 0
>>> }
>>>
>>> I would prefer to use the go library's functions if at all possible
>>> rather than make my own but so far I haven't found alternatives or even
>>> discussions on this topic. If anything's unclear let me know. Cheers.
>>>
>>
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/golang-nuts/dd169f51-c112-4cbe-90c4-db0337e5c6aen%40googlegroups.com.