On Sun, May 22, 2016 at 6:56 AM, Lauri Kenttä <lauri.ken...@gmail.com> wrote: > Hello, Internals! > > I was fixing #72152 when it became apparent that the base64_decode function > is very buggy. > > > - Null byte ends processing. > > - "V" produces empty result, while "V=" fails. Not very logical. > > - Too short padding is allowed, e.g. "VV=" works like "VV==". > > - Extra padding is allowed (like "V====="). > > - Invalid padding is allowed ("=VVV=", "VV=V=", "VVV==") except on the > second place of a 24-bit run ("V=VV=" fails). > > - In strict mode, space between padding fails: > "V V==" and "VV ==" and "VV== " are allowed, > "VV= =" fails. > > - In strict mode, after a padding, one character is skipped, so "VVV=V" > decodes to "UU" (should be "UUU"), and "VVVV=*" decodes to "UUU" instead of > failing. > > > For each of the above, what would be the preferred behaviour in default mode > and strict mode? > > > Affected existing tests: > > - ext/openssl/tests/bug61124.phpt uses "kzo w2RMExUTYQXW2Xzxmg==" as an > invalid base64 string, based on the invalid padding. > > - ext/standard/tests/file/stream_rfc2397_006.phpt tests > "#Zm9vYmFyIGZvb2Jhcg==" and excepts this to be valid, while "#" is clearly > not valid base64. This also raises a question whether fragments should be > skipped in data uri handling. > > > Suggestions? > > I've created a bug-for-bug compatible rewrite of base64_decode [1], with all > the bugs neatly and specifically implemented and missing features commented > out, so it's now very simple to fix them one by one. > > I've also attached a test script that tests "all" possible combinations of > data, padding, NUL and other invalid characters, and my first patch indeed > provides identical results to the old implementation. > > > Currently interesting lines in the test results: > 'base64' 'default' 'strict' > 'V' '' '' > 'V=' (false) (false) > 'VV=' 'U' 'U' > 'VV==' 'U' 'U' > 'V=====' (false) (false) > '=VVV=' 'UU' (false) > 'VV=V=' 'UU' (false) > 'VVV==' 'UU' 'UU' > 'V=VV=' (false) (false) > 'V V==' 'U' 'U' > 'VV ==' 'U' 'U' > 'VV== ' 'U' 'U' > 'VV= =' 'U' (false) > 'VVV=V' 'UUU' 'UU' > 'VVVV=*' 'UUU' 'UUU' > 'VVVVVV=V' 'UUUUU' 'UUUU' > 'VVVVVV=*' 'UUUU' 'UUUU' > 'VVVV===*' 'UUU' 'UUU' > 'VVV====V' 'UUU' 'UU' > 'VVV====*' 'UU' 'UU' > 'VV=====V' 'UU' 'U' > 'VV=====*' 'U' 'U' > '=======*' '' '' > > -- > Lauri Kenttä > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php
I'm surprised to see that these functions weren't heavily tested. I've included some of the RFC 4648 test vectors in my cache-timing-safe userland implementation at https://github.com/paragonie/constant_time_encoding -- maybe this would be useful for base64_decode() as well? Scott Arciszewski Chief Development Officer Paragon Initiative Enterprises <https://paragonie.com> -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php