Thanks for the pointer, Roger. After finally getting the normalising to rawstd base64 encoding to work I was trying to get my head around the fact that base64 content seems to often have several newlines around it.
Then I found encoding/base64, which has the func (r *newlineFilteringReader) Read(p []byte) (int, error) which elegantly resolves this. https://cs.opensource.google/go/go/+/refs/tags/go1.23.4:src/encoding/base64/base64.go;l=622 I stole the function and simply added '=' in addition to '\n' and '\r' to the list of runes to skip. I'll see how I go with that but might need to look at your longer list of "garbage" runes. I'm going to enjoy looking through the code. Thank you! Rory On 14/01/25, roger peppe (rogpe...@gmail.com) wrote: > Tangentially related to this thread, a while back, I wrote a Go > implementation of the base64 command that is agnostic about which encoding > it reads (and can write all the possible encodings). It can be installed > with: > go install github.com/rogpeppe/misc/cmd/base64@latest > > It's arguably a little too lenient in what it accepts, but it works for me > :) > > The source is here > https://github.com/rogpeppe/misc/blob/f64633da4fd4/cmd/base64/base64.go > > On Tue, 14 Jan 2025 at 14:53, Rory Campbell-Lange <r...@campbell-lange.net> > wrote: > > > Thanks for finding that foolish error, Brian. > > > > To wrap the thread up, the implementation below seems to work ok for > > reading both base64.RawStdEncoding and base64.StdEncoding encoded data > > using the base64.RawStdEncoding decoder. > > > > Example usage: > > > > b64 := NewB64Translator(bytes.NewReader(encodedBytes)) > > b, err := io.ReadAll(base64.NewDecoder(base64.RawStdEncoding, b64)) > > > > The implementation: > > > > type B64Translator struct { > > br *bufio.Reader > > } > > > > func NewB64Translator(r io.Reader) *B64Translator { > > return &B64Translator{ > > br: bufio.NewReader(r), > > } > > } > > > > // Read reads off the buffered reader expecting base64.StdEncoding > > bytes > > // with (potentially) 1-3 '=' padding characters at the end. > > // RawStdEncoding can be used for both StdEncoded and RawStdEncoded > > data > > // if the padding is removed. > > func (b *B64Translator) Read(p []byte) (n int, err error) { > > h := make([]byte, len(p)) > > n, err = b.br.Read(h) > > if err != nil { > > return n, err > > } > > // check if there is any padding in the last three bytes > > tail := make([]byte, 3) > > if n > 3 { > > _ = copy(tail, h[n-3:n]) > > } else { > > _ = copy(tail, h[:n]) > > } > > c := bytes.Count(tail, []byte("=")) > > copy(p, h[:n-c]) > > return n - c, nil > > } > > > > For larger data the "tail" approach seems to have a tiny speed improvement > > over a naive bytes.Count(b, []byte("=")) over the whole buffer. > > > > Thanks to everyone for their help. > > > > Rory > > > > On 14/01/25, 'Brian Candler' via golang-nuts (golang-nuts@googlegroups.com) > > wrote: > > > I was more or less right. The input string, which you encoded to > > > "Qm9uam91ciwgam95ZXV4IGxpb24K", contains an encoded newline at the end. > > > It's not spurious. > > > > > > Confirmed by the "echo" pipeline I gave above, or in Go itself: > > > https://go.dev/play/p/6kSxiCfCTo4 > > > > > > You can also confirm it by multiplying the length of the input by 3/4 > > > > > > % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | wc -c > > > 28 > > > > > > 28*3/4 = 21 > > > B o n j o u r > > > , _ j o y e u > > > x _ l i o n \n > > > > > > > > > On Tuesday, 14 January 2025 at 10:10:22 UTC Brian Candler wrote: > > > > > > > Sorry ignore that, I hadn't checked your playground link. > > > > > > > > On Tuesday, 14 January 2025 at 10:07:53 UTC Brian Candler wrote: > > > > > > > >> > AS I wrote earlier, I'm trying to avoid reading the entire email > > part > > > >> into memory to discover if I should use base64.StdEncoding or > > > >> base64.RawStdEncoding. > > > >> > > > >> As I asked before, why would you ever need to use RawStdEncoding? It > > just > > > >> means the MIME part was invalid, most likely corrupted/truncated. > > > >> > > > >> > One odd thing is that I'm getting extraneous newlines (shown by > > stars > > > >> in the output), eg: > > > >> > > > >> You are feeding two different inputs which do not differ by > > truncation > > > >> alone. > > > >> > > > >> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c > > > >> 0000000 B o n j o u r , j o y e u x > > > >> 0000010 l i o n \n > > > >> 0000015 > > > >> > > > >> % echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c > > > >> 0000000 " B o n j o u r , j o y e u > > x > > > >> 0000010 l i o n " > > > >> 0000016 > > > >> > > > >> The second one has encoded double-quotes before and after the content. > > > >> > > > >> On Monday, 13 January 2025 at 22:43:51 UTC Rory Campbell-Lange wrote: > > > >> > > > >>> AS I wrote earlier, I'm trying to avoid reading the entire email > > part > > > >>> into memory to discover if I should use base64.StdEncoding or > > > >>> base64.RawStdEncoding. > > > >>> > > > >>> The following seems to work reasonably well: > > > >>> > > > >>> type B64Translator struct { > > > >>> br *bufio.Reader > > > >>> } > > > >>> > > > >>> func NewB64Translator(r io.Reader) *B64Translator { > > > >>> return &B64Translator{ > > > >>> br: bufio.NewReader(r), > > > >>> } > > > >>> } > > > >>> > > > >>> // Read reads off the buffered reader expecting base64.StdEncoding > > bytes > > > >>> // with (potentially) 1-3 '=' padding characters at the end. > > > >>> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded > > data > > > >>> // if the padding is removed. > > > >>> func (b *B64Translator) Read(p []byte) (n int, err error) { > > > >>> h := make([]byte, len(p)) > > > >>> n, err = b.br.Read(h) > > > >>> if err != nil { > > > >>> return n, err > > > >>> } > > > >>> // to be optimised > > > >>> c := bytes.Count(h, []byte("=")) > > > >>> copy(p, h[:n-c]) > > > >>> // fmt.Println(string(h), n, string(p), n-c) > > > >>> return n - c, nil > > > >>> } > > > >>> > > > >>> https://go.dev/play/p/H6ii7Vy-8as > > > >>> > > > >>> One odd thing is that I'm getting extraneous newlines (shown by > > stars in > > > >>> the output), eg: > > > >>> > > > >>> -- > > > >>> raw: Bonjour joyeux lion > > > >>> Qm9uam91ciwgam95ZXV4IGxpb24K > > > >>> ok: false > > > >>> decoded: Bonjour, joyeux lion* <-------------------- e.g. here > > > >>> -- > > > >>> std: "Bonjour, joyeux lion" > > > >>> IkJvbmpvdXIsIGpveWV1eCBsaW9uIg== > > > >>> ok: true > > > >>> decoded: "Bonjour, joyeux lion" > > > >>> -- > > > >>> > > > >>> Any thoughts on that would be gratefully received. > > > >>> > > > >>> Rory > > > >>> > > > >>> > > > >>> On 13/01/25, Rory Campbell-Lange (ro...@campbell-lange.net) wrote: > > > >>> > Thanks very much for the playground link and thoughts. > > > >>> > > > > >>> > The use case is reading base64 email parts, which could be of a > > very > > > >>> large size. It is unclear when processing these parts if they are > > base64 > > > >>> padded or not. > > > >>> > > > > >>> > I'm trying to avoid reading the entire email part into memory. > > > >>> Consequently I think your earlier idea of adding padding (or > > removing it) > > > >>> in a wrapper could work. Perhaps wrapping the reader with another > > using a > > > >>> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper > > could > > > >>> add padding if needed. > > > >>> > > > > >>> > Rory > > > >>> > > > > >>> > On 13/01/25, Axel Wagner (axel.wa...@googlemail.com) wrote: > > > >>> > > Just realized: If you twist the idea around, you get something > > easy > > > >>> to > > > >>> > > implement and more correct. > > > >>> > > Instead of stripping padding if it exist, you can ensure that > > the > > > >>> body *is* > > > >>> > > padded to a multiple of 4 bytes: > > https://go.dev/play/p/SsPRXV9ZfoS > > > >>> > > You can then feed that to base64.StdEncoding. If the wrapped > > Reader > > > >>> returns > > > >>> > > padded Base64, this does nothing. If it returns unpadded Base64, > > it > > > >>> adds > > > >>> > > padding. If it returns incorrect Base64, it will create a padded > > > >>> stream, > > > >>> > > that will then get rejected by the Base64 decoder. > > > >>> > > > > > >>> > > On Mon, 13 Jan 2025 at 10:31, Axel Wagner < > > axel.wa...@googlemail.com> > > > >>> > > > >>> > > wrote: > > > >>> > > > > > >>> > > > Hi, > > > >>> > > > > > > >>> > > > one way to solve your problem is to wrap the body into an > > > >>> io.Reader that > > > >>> > > > strips off everything after the first `=` it finds. That can > > then > > > >>> be fed to > > > >>> > > > base64.RawStdEncoding. This approach requires no extra > > buffering > > > >>> or copying > > > >>> > > > and is easy to implement: https://go.dev/play/p/CwcVz7oietI > > > >>> > > > > > > >>> > > > The downside is, that this will not verify that the body is > > > >>> *either* > > > >>> > > > correctly padded Base64 *or* unpadded Base64. So, it will not > > > >>> report an > > > >>> > > > error if fed something like "AAA=garbage". > > > >>> > > > That can be remedied by buffering up to four bytes and, when > > > >>> encountering > > > >>> > > > an EOF, check that there are at most three trailing `=` and > > that > > > >>> the total > > > >>> > > > length of the stream is divisible by four. It's more finicky > > to > > > >>> implement, > > > >>> > > > but it should also be possible without any extra copies and > > only > > > >>> requires a > > > >>> > > > very small extra buffer. > > > >>> > > > > > > >>> > > > On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange < > > > >>> ro...@campbell-lange.net> > > > >>> > > > wrote: > > > >>> > > > > > > >>> > > >> Thanks very much for the links, pointers and possible > > solution. > > > >>> > > >> > > > >>> > > >> Trying to read base64 standard (padded) encoded data with > > > >>> > > >> base64.RawStdEncoding can produce an error such as > > > >>> > > >> > > > >>> > > >> illegal base64 data at input byte <n> > > > >>> > > >> > > > >>> > > >> Reading base64 raw (unpadded) encoded data produces the EOF > > > >>> error. > > > >>> > > >> > > > >>> > > >> I'll go with trying to read the standard encoded data up to > > maybe > > > >>> 1MB and > > > >>> > > >> then switch to base64.RawStdEncoding if I hit the "illegal > > base64 > > > >>> data" > > > >>> > > >> problem, maybe with reference to bufio.Reader which has most > > of > > > >>> the methods > > > >>> > > >> suggested below. > > > >>> > > >> > > > >>> > > >> Yes, the use of a "Rewind" method would be crucial. I guess > > this > > > >>> would > > > >>> > > >> need to: > > > >>> > > >> 1. error if more than one buffer of data has been read > > > >>> > > >> 2. else re-read from byte 0 > > > >>> > > >> > > > >>> > > >> Thanks again very much for these suggestions. > > > >>> > > >> > > > >>> > > >> Rory > > > >>> > > >> > > > >>> > > >> On 12/01/25, robert engels (ren...@ix.netcom.com) wrote: > > > >>> > > >> > Also, see this > > > >>> > > >> > > > >>> > > https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go > > > >>> > > >> as I expected the error should be reported earlier than the > > end > > > >>> of stream > > > >>> > > >> if the chosen format is wrong. > > > >>> > > >> > > > > >>> > > >> > > On Jan 12, 2025, at 2:57 PM, robert engels < > > > >>> ren...@ix.netcom.com> > > > >>> > > >> wrote: > > > >>> > > >> > > > > > >>> > > >> > > Also, this is what Gemini provided which looks basically > > > >>> correct - > > > >>> > > >> but I think encapsulating it with a Rewind() method would be > > > >>> easier to > > > >>> > > >> understand. > > > >>> > > >> > > > > > >>> > > >> > > > > > >>> > > >> > > > > > >>> > > >> > > While Go doesn't have a built-in PushbackReader like some > > > >>> other > > > >>> > > >> languages (e.g., Java), you can implement similar > > functionality > > > >>> using a > > > >>> > > >> custom struct and a buffer. > > > >>> > > >> > > > > > >>> > > >> > > Here's an example implementation: > > > >>> > > >> > > > > > >>> > > >> > > package main > > > >>> > > >> > > > > > >>> > > >> > > import ( > > > >>> > > >> > > "bytes" > > > >>> > > >> > > "io" > > > >>> > > >> > > ) > > > >>> > > >> > > > > > >>> > > >> > > type PushbackReader struct { > > > >>> > > >> > > reader io.Reader > > > >>> > > >> > > buffer *bytes.Buffer > > > >>> > > >> > > } > > > >>> > > >> > > > > > >>> > > >> > > func NewPushbackReader(r io.Reader) *PushbackReader { > > > >>> > > >> > > return &PushbackReader{ > > > >>> > > >> > > reader: r, > > > >>> > > >> > > buffer: new(bytes.Buffer), > > > >>> > > >> > > } > > > >>> > > >> > > } > > > >>> > > >> > > > > > >>> > > >> > > func (p *PushbackReader) Read(b []byte) (n int, err > > error) { > > > >>> > > >> > > if p.buffer.Len() > 0 { > > > >>> > > >> > > return p.buffer.Read(b) > > > >>> > > >> > > } > > > >>> > > >> > > return p.reader.Read(b) > > > >>> > > >> > > } > > > >>> > > >> > > > > > >>> > > >> > > func (p *PushbackReader) UnreadByte() error { > > > >>> > > >> > > if p.buffer.Len() == 0 { > > > >>> > > >> > > return io.EOF > > > >>> > > >> > > } > > > >>> > > >> > > lastByte := p.buffer.Bytes()[p.buffer.Len()-1] > > > >>> > > >> > > p.buffer.Truncate(p.buffer.Len() - 1) > > > >>> > > >> > > p.buffer.WriteByte(lastByte) > > > >>> > > >> > > return nil > > > >>> > > >> > > } > > > >>> > > >> > > > > > >>> > > >> > > func (p *PushbackReader) Unread(buf []byte) error { > > > >>> > > >> > > if p.buffer.Len() == 0 { > > > >>> > > >> > > return io.EOF > > > >>> > > >> > > } > > > >>> > > >> > > p.buffer.Write(buf) > > > >>> > > >> > > return nil > > > >>> > > >> > > } > > > >>> > > >> > > > > > >>> > > >> > > func main() { > > > >>> > > >> > > // Example usage > > > >>> > > >> > > r := NewPushbackReader(bytes.NewBufferString("Hello, > > > >>> World!")) > > > >>> > > >> > > buf := make([]byte, 5) > > > >>> > > >> > > r.Read(buf) > > > >>> > > >> > > r.UnreadByte() > > > >>> > > >> > > r.Read(buf) > > > >>> > > >> > > } > > > >>> > > >> > > > > > >>> > > >> > > Explanation: > > > >>> > > >> > > PushbackReader struct: This struct holds the underlying > > > >>> io.Reader and > > > >>> > > >> a buffer to store the pushed-back bytes. > > > >>> > > >> > > NewPushbackReader: This function creates a new > > PushbackReader > > > >>> from an > > > >>> > > >> existing io.Reader. > > > >>> > > >> > > Read method: This method reads bytes from either the > > buffer > > > >>> (if it > > > >>> > > >> contains data) or the underlying reader. > > > >>> > > >> > > UnreadByte method: This method pushes back a single byte > > into > > > >>> the > > > >>> > > >> buffer. > > > >>> > > >> > > Unread method: This method pushes back a slice of bytes > > into > > > >>> the > > > >>> > > >> buffer. > > > >>> > > >> > > Important Considerations: > > > >>> > > >> > > The buffer size is not managed automatically. You may > > need to > > > >>> adjust > > > >>> > > >> the buffer size based on your use case. > > > >>> > > >> > > This implementation does not handle pushing back beyond > > the > > > >>> initially > > > >>> > > >> read data. If you need to support arbitrary pushback, you'll > > need > > > >>> a more > > > >>> > > >> complex solution. > > > >>> > > >> > > > > > >>> > > >> > > Generative AI is experimental. > > > >>> > > >> > > > > > >>> > > >> > >> On Jan 12, 2025, at 2:53 PM, Robert Engels < > > > >>> ren...@ix.netcom.com> > > > >>> > > >> wrote: > > > >>> > > >> > >> > > > >>> > > >> > >> You can see the two pass reader here > > > >>> > > >> > > > >>> > > https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go > > > >>> > > >> > >> > > > >>> > > >> > >> But yea, the basic premise is that you buffer the data > > so > > > >>> you can > > > >>> > > >> rewind if needed > > > >>> > > >> > >> > > > >>> > > >> > >> Are you certain it is reading to the end to return EOF? > > It > > > >>> may be > > > >>> > > >> returning eof once the parsing fails. > > > >>> > > >> > >> > > > >>> > > >> > >> Otherwise I would expect this is being decoded wrong - > > eg > > > >>> the mime > > > >>> > > >> type or encoding type should tell you the correct format > > before > > > >>> you start > > > >>> > > >> decoding. > > > >>> > > >> > >> > > > >>> > > >> > >>> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange < > > > >>> > > >> ro...@campbell-lange.net> wrote: > > > >>> > > >> > >>> > > > >>> > > >> > >>> Thanks for the suggestion of a ReadSeeker to wrap an > > > >>> io.Reader. > > > >>> > > >> > >>> > > > >>> > > >> > >>> My google fu must be deserting me. I can find > > > >>> PushbackReader > > > >>> > > >> implementations in Java, but the only similar thing for Go I > > > >>> could find was > > > >>> > > >> https://gitlab.com/osaki-lab/iowrapper. If you have a > > specific > > > >>> > > >> recommendation for a ReadSeeker wrapper to an io.Reader that > > > >>> would be great > > > >>> > > >> to know. > > > >>> > > >> > >>> > > > >>> > > >> > >>> Since the base64 decoding error I'm looking for is an > > EOF, > > > >>> I guess > > > >>> > > >> the wrapper approach will not work when the EOF byte position > > is > > > >>> > than the > > > >>> > > >> io.ReadSeeker buffer size. > > > >>> > > >> > >>> > > > >>> > > >> > >>> Rory > > > >>> > > >> > >>> > > > >>> > > >> > >>> On 12/01/25, robert engels (ren...@ix.netcom.com) > > wrote: > > > >>> > > >> > >>>> create a ReadSeeker that wraps the Reader providing > > the > > > >>> buffering > > > >>> > > >> (mark & reset) - normally the buffer only needs to be large > > > >>> enough to > > > >>> > > >> detect the format contained in the Reader. > > > >>> > > >> > >>>> > > > >>> > > >> > >>>> You can search Google for PushbackReader in Go and > > you’ll > > > >>> get a > > > >>> > > >> basic implementation. > > > >>> > > >> > >>>> > > > >>> > > >> > >>>>> On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange < > > > >>> > > >> ro...@campbell-lange.net> wrote: > > > >>> > > >> > >>> ... > > > >>> > > >> > >>>>> I'm attempting to rationalise the process [of > > avoiding > > > >>> reading > > > >>> > > >> email parts into byte slices] by simply wrapping the provided > > > >>> io.Reader > > > >>> > > >> with the necessary decoders to reduce memory usage and > > > >>> unnecessary > > > >>> > > >> processing. > > > >>> > > >> > >>>>> > > > >>> > > >> > >>>>> The wrapping strategy seems to work ok. However there > > is > > > >>> a > > > >>> > > >> particular issue in detecting base64.StdEncoding versus > > > >>> > > >> base64.RawStdEncoding, which requires draining the io.Reader > > > >>> using > > > >>> > > >> base64.StdEncoding and (based on the current implementation) > > > >>> switching to > > > >>> > > >> base64.RawStdEncoding if an io.ErrUnexpectedEOF is found. > > > >>> > > >> > >>>>> > > > >>> > > >> > >> > > > >>> > > >> > >> > > > >>> > > >> > >> -- > > > >>> > > >> > >> You received this message because you are subscribed to > > the > > > >>> Google > > > >>> > > >> Groups "golang-nuts" group. > > > >>> > > >> > >> To unsubscribe from this group and stop receiving emails > > > >>> from it, > > > >>> > > >> send an email to golang-nuts...@googlegroups.com <mailto: > > > >>> > > >> golang-nuts...@googlegroups.com>. > > > >>> > > >> > >> To view this discussion visit > > > >>> > > >> > > > >>> > > https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com > > > >>> > > >> < > > > >>> > > >> > > > >>> > > https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com?utm_medium=email&utm_source=footer > > > >>> > > >> >. > > > >>> > > >> > > > > > >>> > > >> > > > > >>> > > >> > > > >>> > > >> -- > > > >>> > > >> You received this message because you are subscribed to the > > > >>> Google Groups > > > >>> > > >> "golang-nuts" group. > > > >>> > > >> To unsubscribe from this group and stop receiving emails from > > it, > > > >>> send an > > > >>> > > >> email to golang-nuts...@googlegroups.com. > > > >>> > > >> To view this discussion visit > > > >>> > > >> > > > >>> > > https://groups.google.com/d/msgid/golang-nuts/Z4Q0AFRkkoNH52_B%40campbell-lange.net > > > >>> > > >> . > > > >>> > > >> > > > >>> > > > > > > >>> > > > > >>> > -- > > > >>> > You received this message because you are subscribed to the Google > > > >>> Groups "golang-nuts" group. > > > >>> > To unsubscribe from this group and stop receiving emails from it, > > send > > > >>> an email to golang-nuts...@googlegroups.com. > > > >>> > To view this discussion visit > > > >>> > > https://groups.google.com/d/msgid/golang-nuts/Z4UQYJmuk7Oe6xSG%40campbell-lange.net. > > > > > >>> > > > >>> > > > >> > > > > > > -- > > > You received this message because you are subscribed to the Google > > Groups "golang-nuts" group. > > > To unsubscribe from this group and stop receiving emails from it, send > > an email to golang-nuts+unsubscr...@googlegroups.com. > > > To view this discussion visit > > https://groups.google.com/d/msgid/golang-nuts/a990ab8b-7437-45f3-a0e5-81d9b7cab4a3n%40googlegroups.com > > . > > > > -- > > You received this message because you are subscribed to the Google Groups > > "golang-nuts" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to golang-nuts+unsubscr...@googlegroups.com. > > To view this discussion visit > > https://groups.google.com/d/msgid/golang-nuts/Z4Z6VkUeV3w3EOQS%40campbell-lange.net > > . > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/Z4bxo-zcJqaVKMO1%40campbell-lange.net.