On Wed, Feb 06, 2013 at 09:14:12PM +0100, Markus Armbruster wrote: > mdroth <mdr...@linux.vnet.ibm.com> writes: > > > On Wed, Feb 06, 2013 at 10:06:03AM +0100, Markus Armbruster wrote: > >> Markus Armbruster <arm...@redhat.com> writes: > >> > >> > Eric Blake <ebl...@redhat.com> writes: > >> > > >> >> On 02/05/2013 09:22 AM, Markus Armbruster wrote: > >> >>> Command memchar-write takes data and size parameter. Begs the > >> >>> question what happens when data doesn't match size. > >> >>> > >> >>> With format base64, qmp_memchar_write() copies the full data argument, > >> >>> regardless of size argument. > >> >>> > >> >>> With format utf8, qmp_memchar_write() copies size bytes from data, > >> >>> happily reading beyond data. Copies crap from the heap or even > >> >>> crashes. > >> >>> > >> >>> Drop the size parameter, and always copy the full data argument. > >> >>> > >> >>> Signed-off-by: Markus Armbruster <arm...@redhat.com> > >> >>> --- > >> >>> hmp.c | 4 +--- > >> >>> qapi-schema.json | 4 +--- > >> >>> qemu-char.c | 8 +++----- > >> >>> qmp-commands.hx | 4 +--- > >> >>> 4 files changed, 6 insertions(+), 14 deletions(-) > >> >> > >> >>> if (has_format && (format == DATA_FORMAT_BASE64)) { > >> >>> write_data = g_base64_decode(data, &write_count); > >> >>> } else { > >> >>> write_data = (uint8_t *)data; > >> >>> + write_count = strlen(data); > >> >>> } > >> >> > >> >> Obviously, base64 is the only way to write an embedded NUL. But what > >> > > >> > Consider the JSON string "this \\u0000 is fun". But our JSON parser > >> > chokes on it, so we can ignore it until that's fixed. See my "[PATCH] > >> > check-qjson: More thorough testing of UTF-8 in strings", specifically > >> > the comment right at the beginning of utf8_string(). > >> > > >> >> happens if the user requests base64 encoding, but the utf8 string that > >> >> got passed in through JSON is not a valid base64-encoded string? Does > >> >> g_base64_decode report an error in that case, and should you be handling > >> >> the error here? > >> > > >> > Good question. I wish it had occured to GLib developers: > >> > http://developer.gnome.org/glib/stable/glib-Base64-Encoding.html#g-base64-decode > >> > > >> > Seriously, I need to find out what the heck g_base64_decode() does when > >> > it's fed crap. If it fails in a reliable and detectable manner, I need > >> > to handle the failure. If not, I need to replace the function. > >> > > >> > Moreover, I should document which characters outside the base64 alphabet > >> > are silently ignored, if any. All whitespace? Just newlines? > >> > >> As far as I can tell, it never fails, but silently ignores characters > >> outside the alphabet [A-Za-z0-9+/], as well as unpadded suffixes. Oh, > >> and it does something weird when padding appears in the middle. > >> Craptastic. > >> > >> We can either document this behavior as a feature, or we replace the > >> function by one that accepts only valid base64. If we do the latter, we > >> better specify the language we want to accept now, but I guess we could > >> choose to delay the actual checking post 1.4. > >> > >> There's another use of g_base64_decode() in qmp_guest_file_write(). > >> Which means guest agent command guest-file-write is similarly > >> ill-defined. Mike, anything to be done there? > > > > For qemu-ga I think the documentation makes it clear enough that we're > > expecting b64 inputs rather than just interpreting random input as b64, > > so I don't see a problem with making the checks stricter in the future. > > > > One other concern though: > > > >> > >> > >> ---<test program>--- > >> #include <glib.h> > >> #include <stdio.h> > >> #include <stdlib.h> > >> > >> int > >> main(int argc, char *argv[]) > >> { > >> char **ap, *b64; > >> unsigned char *buf; > >> size_t sz, i; > >> > >> for (ap = argv + 1; *ap; ap++) { > >> printf("in : %s\n", *ap); > >> buf = g_base64_decode(*ap, &sz); > >> printf("out:"); > >> for (i = 0; i < sz; i++) { > >> printf(" %02x", buf[i]); > >> } > >> putchar('\n'); > >> b64 = g_base64_encode(buf, sz); > >> printf("b64: %s\n", b64); > >> free(buf); > >> } > >> } > >> ---<test run>--- > >> in : 1 > >> out: > >> b64: > >> in : 1= > >> out: > >> b64: > >> in : 1== > >> out: > >> b64: > >> in : 1=== > >> out: d4 > >> b64: 1A== > >> in : 12 > >> out: > >> b64: > >> in : 123 > >> out: > >> b64: > >> in : 1234 > >> out: d7 6d f8 > >> b64: 1234 > >> in : =1234 > >> out: 03 5d b7 > >> b64: A123 > >> in : 1=234 > >> out: d4 0d b7 > >> b64: 1A23 > >> in : <>?,./watch this!@#$%^&*()_+ > >> out: ff 06 ad 72 1b 61 > >> b64: /watchth > >> in : /watchthis+ > >> out: ff 06 ad 72 1b 61 > >> b64: /watchth > > > > Am I misinterpreting the output or is base64_encode() actually spitting > > *out* invalid base64 strings for certain inputs? If so that seems pretty > > bad for something like guest-file-read, where inputs to base64_encode() > > are for all intents completely random. Addressing it in hard freeze may > > not be reasonable, since qemu-ga users must already be prepared to receive > > garbage from malicious/buggy agents, but I'd certainly pick up a fix for a > > subsequent stable release. > > Which base64_encode() outputs in my test run do you suspect of being > bad? >
My mistake. The last 2 caught my eye, but I didn't realize "/" was a valid base64 character, and that characters were being truncated from the original input due to padding restrictions, I thought it was just fudging up the plaintext.