On Sun, May 22, 2016 at 6:56 AM, Lauri Kenttä <lauri.ken...@gmail.com> wrote:
> Hello, Internals!
>
> I was fixing #72152 when it became apparent that the base64_decode function
> is very buggy.
>
>
> - Null byte ends processing.
>
> - "V" produces empty result, while "V=" fails. Not very logical.
>
> - Too short padding is allowed, e.g. "VV=" works like "VV==".
>
> - Extra padding is allowed (like "V=====").
>
> - Invalid padding is allowed ("=VVV=", "VV=V=", "VVV==") except on the
> second place of a 24-bit run ("V=VV=" fails).
>
> - In strict mode, space between padding fails:
> "V V==" and "VV ==" and "VV== " are allowed,
> "VV= =" fails.
>
> - In strict mode, after a padding, one character is skipped, so "VVV=V"
> decodes to "UU" (should be "UUU"), and "VVVV=*" decodes to "UUU" instead of
> failing.
>
>
> For each of the above, what would be the preferred behaviour in default mode
> and strict mode?
>
>
> Affected existing tests:
>
> - ext/openssl/tests/bug61124.phpt uses "kzo w2RMExUTYQXW2Xzxmg==" as an
> invalid base64 string, based on the invalid padding.
>
> - ext/standard/tests/file/stream_rfc2397_006.phpt tests
> "#Zm9vYmFyIGZvb2Jhcg==" and excepts this to be valid, while "#" is clearly
> not valid base64. This also raises a question whether fragments should be
> skipped in data uri handling.
>
>
> Suggestions?
>
> I've created a bug-for-bug compatible rewrite of base64_decode [1], with all
> the bugs neatly and specifically implemented and missing features commented
> out, so it's now very simple to fix them one by one.
>
> I've also attached a test script that tests "all" possible combinations of
> data, padding, NUL and other invalid characters, and my first patch indeed
> provides identical results to the old implementation.
>
>
> Currently interesting lines in the test results:
> 'base64'         'default'       'strict'
> 'V'              ''              ''
> 'V='             (false)         (false)
> 'VV='            'U'             'U'
> 'VV=='           'U'             'U'
> 'V====='         (false)         (false)
> '=VVV='          'UU'            (false)
> 'VV=V='          'UU'            (false)
> 'VVV=='          'UU'            'UU'
> 'V=VV='          (false)         (false)
> 'V V=='          'U'             'U'
> 'VV =='          'U'             'U'
> 'VV== '          'U'             'U'
> 'VV= ='          'U'             (false)
> 'VVV=V'          'UUU'           'UU'
> 'VVVV=*'         'UUU'           'UUU'
> 'VVVVVV=V'       'UUUUU'         'UUUU'
> 'VVVVVV=*'       'UUUU'          'UUUU'
> 'VVVV===*'       'UUU'           'UUU'
> 'VVV====V'       'UUU'           'UU'
> 'VVV====*'       'UU'            'UU'
> 'VV=====V'       'UU'            'U'
> 'VV=====*'       'U'             'U'
> '=======*'       ''              ''
>
> --
> Lauri Kenttä
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php

I'm surprised to see that these functions weren't heavily tested.

I've included some of the RFC 4648 test vectors in my
cache-timing-safe userland implementation at
https://github.com/paragonie/constant_time_encoding -- maybe this
would be useful for base64_decode() as well?

Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises <https://paragonie.com>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to