[go-nuts] Re: Question about Uvarint

R. Men Sat, 04 Oct 2025 10:08:47 -0700

You're quite right. I need to remove the b > 1 conditional. Will need to 
create thorough test cases to make sure it complies the sqlite formatting  
and handles these cases. I can see now, even if there was big-endian 
uvarint() implementation I would still need to write my own, given sqlite 
9-byte optimisation.


On Saturday, October 4, 2025 at 1:29:03 PM UTC+2 Brian Candler wrote:

> Here is a test case for which your function still doesn't work:
> fmt.Println(Uvarint([]byte{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 
> 0x00})) // should return 18446744073709551360 (=0xffffffffffffff00), 9
>
> If you have read 8 bytes with the top bit set, then you must 
> *unconditionally* consume all 8 bits of the 9th byte, regardless of its 
> value.
>
> On Saturday, 4 October 2025 at 10:42:59 UTC+1 R. Men wrote:
>
>> Hi Brian,
>>
>> Yes, it seems I'll have to go the custom function route, given their 
>> non-standard encoding. Thanks for confirming, and really appreciate those 
>> tests. I fixed my code to handle >2 byte ints and special case for the 9th 
>> byte (for which SQLite encoding treats all bits as data). Leaving here in 
>> case anyone else is interested. Have a good weekend!
>>
>> package main
>>
>> import "fmt"
>>
>> const MaxVarintLen64 = 9
>>
>> func Uvarint(buf []byte) (uint64, int) {
>> var x uint64
>> var s uint = 7
>> for i, b := range buf {
>> if i == MaxVarintLen64 {
>> // Catch byte reads past MaxVarintLen64.
>> // See issue https://golang.org/issues/41185
>> return 0, -(i + 1) // overflow
>> }
>> if i == MaxVarintLen64-1 && b > 1 {
>> x <<= s + 1
>> return x | uint64(b), i + 1
>> }
>>
>> if b < 0x80 {
>> x <<= s
>> return x | uint64(b), i + 1
>> }
>> x <<= s
>> x |= uint64(b & 0x7f)
>> }
>> return 0, 0
>> }
>>
>> func main() {
>> fmt.Println(Uvarint([]byte{0x81, 0x47}))                                 
>>           // should return 199, 2
>> fmt.Println(Uvarint([]byte{0xff, 0xff, 0x7f}))                           
>>           // should return 2097151 (=0x1fffff), 3
>> fmt.Println(Uvarint([]byte{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 
>> 0x7f}))       // should return 72057594037927935 (=0xffffffffffffff), 8
>> fmt.Println(Uvarint([]byte{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 
>> 0xff, 0xff})) // should return 18446744073709551615 (=0xffffffffffffffff), 9
>> } 
>>
>> On Saturday, October 4, 2025 at 9:45:26 AM UTC+2 Brian Candler wrote:
>>
>>> So in short, you are saying that the byte sequence 0x81, 0x47 written by 
>>> SQLite decodes by binary.Uvarint to 9089, but you wanted it to decode to 
>>> 199.
>>>
>>> What this means is: the encoding that SQLite has chosen to use is *not* 
>>> the varint as defined by protobuf (and implemented by the Go standard 
>>> library). And therefore, you do indeed need to write your own custom 
>>> decoding function.
>>>
>>> The SQLite file format is defined here: 
>>> https://www.sqlite.org/fileformat.html
>>>
>>> *A variable-length integer or "varint" is a static Huffman encoding of 
>>> 64-bit twos-complement integers that uses less space for small positive 
>>> values. A varint is between 1 and 9 bytes in length. The varint consists of 
>>> either zero or more bytes which have the high-order bit set followed by a 
>>> single byte with the high-order bit clear, or nine bytes, whichever is 
>>> shorter. The lower seven bits of each of the first eight bytes and all 8 
>>> bits of the ninth byte are used to reconstruct the 64-bit twos-complement 
>>> integer. Varints are big-endian: bits taken from the earlier byte of the 
>>> varint are more significant than bits taken from the later bytes.*
>>>
>>> And for protobuf, see: 
>>> https://protobuf.dev/programming-guides/encoding/#varints
>>>
>>> On Saturday, 4 October 2025 at 01:31:25 UTC+1 R. Men wrote:
>>>
>>>> Sure, I'll share my code and what I'm trying to do. Thank you all for 
>>>> the help so far. My program reads the sql table's metadata to determine 
>>>> the 
>>>> type and length of each column in the table. These values are encoded as 
>>>> varint of unsigned bigendian integers. I already validated the expected 
>>>> values match the tables's actual data type/size.
>>>>
>>>> package main
>>>>
>>>> import (
>>>> "encoding/binary"
>>>> "fmt"
>>>> )
>>>>
>>>> func main() {
>>>> // SQLite format 3, sample DB file record header
>>>> //Expected:          7        23      27       27      1         199
>>>> //                        |-------| |-------| |-------| |-------| 
>>>> |------| |----------------|
>>>> inputs := []byte{0x07, 0x17, 0x1b, 0x1b, 0x01, 0x81, 0x47}
>>>> offset := 0
>>>> for remaining := len(inputs); remaining > 0; {
>>>> d, n := binary.Uvarint(inputs[offset:])
>>>> if n <= 0 {
>>>> break
>>>> }
>>>>
>>>> remaining -= n
>>>> offset += n
>>>> fmt.Println(d, n)
>>>>
>>>> // Actual output
>>>> // 7 1
>>>> // 23        1
>>>> // 27 1
>>>> // 27 1
>>>> // 1 1
>>>> // 9089   2
>>>> }
>>>> }
>>>>
>>>> I now see why I get the 9089 figure after looking at Uvarint source 
>>>> code (
>>>> https://cs.opensource.google/go/go/+/refs/tags/go1.25.1:src/encoding/binary/varint.go
>>>> ):
>>>>
>>>> func Uvarint(buf []byte) (uint64, int) {
>>>> var x uint64
>>>> var s uint
>>>> for i, b := range buf {
>>>> if i == MaxVarintLen64 {
>>>> // Catch byte reads past MaxVarintLen64.
>>>> // See issue https://golang.org/issues/41185
>>>> return 0, -(i + 1) // overflow
>>>> }
>>>> if b < 0x80 {
>>>> if i == MaxVarintLen64-1 && b > 1 {
>>>> return 0, -(i + 1) // overflow
>>>> }
>>>> return x | uint64(b)<<s, i + 1
>>>> }
>>>> x |= uint64(b&0x7f) << s  
>>>> s += 7
>>>> }
>>>> return 0, 0
>>>> }
>>>>
>>>> Here I see the bits after the first byte are left-shifted by 7 before 
>>>> concatenating and left-padding.
>>>> My solution so far has been to create custom uvarint function that 
>>>> performs the left-shift before the concat, preserving the byte order. 
>>>>
>>>> func Uvarint(buf []byte) (uint64, int) {
>>>> var x uint64
>>>> var s uint
>>>> for i, b := range buf {
>>>> if i == MaxVarintLen64 {
>>>> // Catch byte reads past MaxVarintLen64.
>>>> // See issue https://golang.org/issues/41185
>>>> return 0, -(i + 1) // overflow
>>>> }
>>>> if b < 0x80 {
>>>> if i == MaxVarintLen64-1 && b > 1 {
>>>> return 0, -(i + 1) // overflow
>>>> }
>>>> x <<= s 
>>>> return x | uint64(b), i + 1
>>>> }
>>>> x <<= s
>>>> x |= uint64(b&0x7f)
>>>> s += 7
>>>> }
>>>> return 0, 0
>>>> }
>>>>
>>>> I would prefer to use the go library's functions if at all possible 
>>>> rather than make my own but so far I haven't found alternatives or even 
>>>> discussions on this topic. If anything's unclear let me know. Cheers.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/9484e18f-96e1-4acd-bf95-58883ed3b993n%40googlegroups.com.

[go-nuts] Re: Question about Uvarint

Reply via email to