New submission from Idan Moral <idan22mo...@gmail.com>:

This is a follow-up PR to GH-24402.

Currently, *base64.b64decode* uses a generic regex to validate *s* (when 
*validate* is true),
which sometimes results in unexpected behavior and exception messages.

Example:

(1)    base64.b64decode('ab==',  validate=True) # b'i'
(2)    base64.b64decode('ab3==', validate=True) # b'i\xbd'
(3)    base64.b64decode('ab=3=', validate=True) # raises binascii.Error: 
Non-base64 digit found
(4)    base64.b64decode('ab==3', validate=True) # raises binascii.Error: 
Non-base64 digit found
(5)    base64.b64decode('ab===', validate=True) # raises binascii.Error: 
Non-base64 digit found
(6)    base64.b64decode('=ab==', validate=True) # raises binascii.Error: 
Non-base64 digit found

The only strict-base64 valid example here is (1).
(2), (4) and (5) should raise 'Excess data after padding',
(3) should raise 'Discontinuous padding not allowed',
and (6) should raise 'Leading padding not allowed'.

To get this behavior, we can use the new (at the time of creating this PR) 
*binascii.a2b_base64* functionality of strict mode.

I have one (not so big) concern - efficiency.
I'm not that experienced with how fast regex-es are (in Python or in general) 
compared to the implementation of *binascii.a2b_base64* in C.
So, I've no idea what would be the impact of migrating from regex 
pre-validation to input parsing.
Let me know if you find it inefficient.

-----

Referenced issue (GH-24402): https://bugs.python.org/issue43086

----------
components: Library (Lib)
messages: 397917
nosy: idan22moral
priority: normal
severity: normal
status: open
title: Adopt binacii.a2b_base64's strict mode in base64.b64decode
type: behavior
versions: Python 3.11

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue44690>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to