On 10 February 2015 at 19:45, Ralph Eastwood <tcmreastw...@gmail.com> wrote: > On 10 February 2015 at 19:23, Ralph Eastwood <tcmreastw...@gmail.com> wrote: >> >> Hi, >> >> Attached patch gives support for uuencode -m, base64 encoding and decoding >> in uudecode. >> Flag, -o, added so that uudecode can output to stdout to override the >> output in encoded files. >> >> uudecode -m is accepted but ignored - the patch has an autodetection of >> the file (begin-base64) in the header. >> >> Cheers, >> Ralph > > > Small fixes to this patch; uudecode didn't flush in a corner case - and > there's an extra #include <assert.h> lying in uuencode. > >
Attached is another patch for uudecode which I think works properly now *cross fingers*. *** How base64 uuencode and uudecode work ================================ uuencodeb64 ----------------- The (new) suckless implementation of uuencode makes use of two buffers; an input and output buffer. The algorithm assumes that the size of the input buffer, Si, is a multiple of 3 and the output buffer is dependent on this size. The output buffer size, So, is 4 * (Si / 3) + 1 because the base64 changes a group of 3 bytes = 24 bits and encodes it into 6-bit values; which means you need 4 characters (i.e. 4 * 6 (= 24 bits)) to encode the same information. Using the fact that the output is a group of 4, the implementation utilises the fact that an unsigned int (uint32_t) is 4 bytes, and encodes the base64 characters into a uint32_t array instead. There is one additional entry to encode the newline character. This implementation assumes that the output buffer writes a line; and hence to give the same output as other implementations, these values need to be kept as is. The workhorse of the algorithm is the loop (which gets hugely optimised by gcc -O3 it seems so you don't actually have to do any loop unrolling for a fast version!): > for (pb = buf, po = out; pb < buf + n; pb += 3) > *po++ = b64e(pb); It utilises b64e which changes 3 bytes into 4 base64 encoded characters using a lookup table. The other parts of the encoding deal with the last case where the remaining bytes are less than 3. Firstly, this means that the input buffer into the workhorse loop may not have '\0' characters in the last incomplete group of 3 inputs and will give an incorrect output. This part clears the end part of the buffer to make sure b64e gives the corrected output. Secondly, although this gives the correct entries for b64e, it will effectively encode '\0' at the end of the stream - the specification [0] dictates '=' be used to pad instead. This is implemented using masks; the masks are dynamically generated for the 1 and 2 byte left cases and then AND'd with b64e and (the inverse of the mask) is AND'd the string as an int (0x3d3d3d3d or "===="), and these two are OR'd together - effectively replacing the '\0' with the padded (or in base64 form, 'A' with '='). uudecodeb64 ----------------- This algorithm makes use of a 60 byte input buffer and 45 byte output buffer; with the same ratios as required as in the encode (without the newline character this time). However, the implementation doesn't depend on their actual sizes if the ratios are kept the same. Unlike the encoding algorithm, this has to ignore whitespace in the input, and hence will be slower, no matter what (plus the fact that the input size is larger!). The algorithm is state machine based and uses a decoding table b64dt which is generated by the attached program. The decoding table indicates to the algorithms which are illegal characters, whitespace characters and the value of the 6-bit base64 characters. By decoding byte per byte, the state machine encodes what position of the 3-byte output it is currently in. Once the output buffer is full it flushes. If a padding character '=' is encountered, then it knows it is the end of the input stream and calculates based on the current decoding state how many '=' are expected. Line numbers are tracked for debugging errors in the stream (though ultimately unecessary). [0] http://pubs.opengroup.org/onlinepubs/000095399/utilities/uuencode.html
/* See LICENSE file for copyright and license details. */ #include <stdio.h> #include <stdlib.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> #include <assert.h> /* much faster */ static const char *b64tab = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; int main(int argc, char *argv[]) { char b64et[256]; for (int i = 0; i < 256; i++) b64et[i] = -1; for (int i = 0; i < 64; i++) b64et[(int)b64tab[i]] = i; b64et[(int)'='] = 0; b64et[(int)'\v'] = -2; b64et[(int)'\f'] = -2; b64et[(int)' '] = -2; b64et[(int)'\n'] = -2; b64et[(int)'\r'] = -2; b64et[(int)'\t'] = -2; printf("{"); for (int i = 0; i < 256; i++) { printf("%d,", b64et[i]); if ((i+1) % 24 == 0) printf("\n"); } printf("}\n"); }
From 9cff8717b24162567ed16e6ac424bdae43bb6db8 Mon Sep 17 00:00:00 2001 From: Tai Chi Minh Ralph Eastwood <tcmreastw...@gmail.com> Date: Wed, 11 Feb 2015 13:27:41 +0000 Subject: [PATCH 4/4] uudecode: fix flushing (again) through rewrite --- uudecode.c | 65 +++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 39 insertions(+), 26 deletions(-) diff --git a/uudecode.c b/uudecode.c index 947a846..305b521 100644 --- a/uudecode.c +++ b/uudecode.c @@ -162,34 +162,49 @@ uudecodeb64(FILE *fp, FILE *outfp) char bufb[60], *pb; char out[45], *po; size_t n; - int b = 0, e, t = 0; + int b = 0, e, t = -1, l = 1; unsigned char b24[3] = {0, 0, 0}; while ((n = fread(bufb, 1, sizeof(bufb), fp))) { for (pb = bufb, po = out; pb < bufb + n; pb++) { - if (*pb == '=') { - if (b == 0 || t) { - /* footer size is ==== is 4 */ - if (++t < 4) + if (*pb == '\n') { + l++; + continue; + } else if (*pb == '=') { + switch (b) { + case 0: + /* expected '=' remaining + * including footer */ + if (--t) { + fwrite(out, 1, + (po - out), + outfp); + return; + } + continue; + case 1: + eprintf("%d: unexpected \"=\"" + "appeared.", l); + case 3: + *po++ = b24[0]; + *po++ = b24[1]; + b = 0; + t = 6; /* expect 6 '=' */ + continue; + case 2: + *po++ = b24[0]; + b = 0; + t = 5; /* expect 5 '=' */ continue; - else - goto flush; - } else if (b == 1) { - eprintf("unexpected \"=\" appeared."); - } else if (b == 2) { - *po++ = b24[0]; - goto flush; - } else if (b == 3) { - *po++ = b24[0]; - *po++ = b24[1]; - goto flush; } - } - if ((e = b64dt[(int)*pb]) == -1) { - eprintf("invalid byte \"%c\"", *pb); - } else if (e == -2) /* whitespace */ + } else if ((e = b64dt[(int)*pb]) == -1) + eprintf("%d: invalid byte \"%c\"", l, *pb); + else if (e == -2) /* whitespace */ continue; - switch (b) { + else if (t > 0) /* state is parsing pad/footer */ + eprintf("%d: invalid byte \"%c\" after padding", + l, *pb); + switch (b) { /* decode next base64 chr based on state */ case 0: b24[0] |= e << 2; break; case 1: b24[0] |= (e >> 4) & 0x3; b24[1] |= (e & 0xf) << 4; break; @@ -197,7 +212,7 @@ uudecodeb64(FILE *fp, FILE *outfp) b24[2] |= (e & 0x3) << 6; break; case 3: b24[2] |= e; break; } - if (++b == 4) { + if (++b == 4) { /* complete decoding an octet */ *po++ = b24[0]; *po++ = b24[1]; *po++ = b24[2]; @@ -205,11 +220,9 @@ uudecodeb64(FILE *fp, FILE *outfp) b = 0; } } - goto flush; + fwrite(out, 1, (po - out), outfp); } - eprintf("invalid uudecode footer \"====\" not found\n"); -flush: - fwrite(out, 1, (po - out), outfp); + eprintf("%d: invalid uudecode footer \"====\" not found\n", l); } static void -- 2.3.0