New submission from Dylan Houlihan <dy...@breakingbits.net>:

Currently, the `base64` method `b16decode` does not decode a hexadecimal string 
with lowercase characters by default. To do so requires passing `casefold=True` 
as the second argument. I propose a change to the `b16decode` method to allow 
it to accept hexadecimal strings containing lowercase characters without 
requiring the `casefold` argument.

The revision itself is straightforward. We simply have to amend the regular 
expression to match the lowercase characters a-f in addition to A-F. Likewise 
the corresponding tests in Lib/base64.py also need to be changed to account for 
the lack of a second argument. Therefore there are two files total which need 
to be refactored.

In my view, there are several compelling reasons for this change:

1. There is a nontrivial performance improvement. I've already made the changes 
on my own test branch[1] and I see a mean decoding performance improvement of 
approximately 9.4% (tested by taking the average of 1,000,000 hexadecimal 
string encodings). The testing details are included in a file attached to this 
issue.

2. Hexadecimal strings are case insensitive, i.e. 8DEF is equivalent to 8def. 
This is the particularly motivating reason why I've written the patch - there 
have been many times when I've been momentarily confounded by a hexadecimal 
string that won't decode only to realize I'm yet again passing in a lowercase 
string.

3. The behavior of the underlying method in `binascii`, `unhexlify`, accepts 
both uppercase and lowercase characters by default without requiring a second 
parameter. From the perspective of code hygiene and language consistency, I 
think it's both more practical and more elegant for the language to behave in 
the same, predictable manner (particularly because `base64.py` simply calls 
`binascii.c` under the hood). Additionally, the `binascii` method `hexlify` 
actually outputs strings in lowercase encoding, meaning that any use of both 
`binascii` and `base64` in the same application will have to consistently do a 
`casefold` conversion if output from `binascii.hexlify` is fed back as input to 
`base64.b16decode` for some reason.

There are two arguments against this patch, as far as I can see it:

1. In the relevant IETF reference documentation (RFC3548[2], referenced 
directly in the `b16decode` docstring; and RFC4648[3] with supersedes it), 
under Security Considerations the author Simon Josefsson claims that there 
exists a potential side channel security issue intrinsic to accepting case 
insensitive hexadecimal strings in a decoding function. While I'm not 
dismissing this out of hand, I personally do not find the claimed vulnerability 
compelling, and Josefsson does not clarify a real world attack scenario or 
threat model. I think it's important we challenge this assumption in light of 
the potential nontrivial improvements to both language consistency and 
performance. I would be very interested in hearing a real threat model here 
that would practically exist outside of a very contrived scenario. Moreover if 
this is such a security issue, why is the behavior already evident in 
`binascii.unhexlify`?

2. The other reason may be that there's simply no reason to make such a change. 
An argument can be put forward that a developer won't frequently have to deal 
with this issue because the opposite method, `b16encode`, produces hexadecimal 
strings with uppercase characters. However, in my experience hexadecimal 
strings with lowercase characters are extremely common in situations where 
developers haven't produced the strings themselves in the language.

As I mentioned, I have already written the changes on my own patch branch. I'll 
open a pull request once this issue has been created and reference the issue in 
the pull request on GitHub.

References:

1. https://github.com/djhoulihan/cpython/tree/base64_case_sensitivity

2. https://tools.ietf.org/html/rfc3548

3. https://tools.ietf.org/html/rfc4648

----------
components: Library (Lib)
files: testing_data.txt
messages: 332319
nosy: djhoulihan
priority: normal
severity: normal
status: open
title: Allow lowercase hexadecimal characters in base64.b16decode()
type: performance
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8
Added file: https://bugs.python.org/file48013/testing_data.txt

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35557>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to