[ 
https://issues.apache.org/jira/browse/CODEC-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004618#comment-18004618
 ] 

Aleksandr Beliakov commented on CODEC-333:
------------------------------------------

Hi [~ggregory] ,

 

Considering the ticket CODEC-75 has been approved "as a feature", it is 
difficult to categorize it as a bug on my own, even though the behavior is not 
completely correct to my best understanding.

I think that the implementation should contain two separate *decoding* tables, 
similarly as it is done for *encoding* for [standard 
base64|https://github.com/apache/commons-codec/blob/master/src/main/java/org/apache/commons/codec/binary/Base64.java#L121]
 and [url-safe 
base64|https://github.com/apache/commons-codec/blob/master/src/main/java/org/apache/commons/codec/binary/Base64.java#L133].
 Currently there is only 
[one|https://github.com/apache/commons-codec/blob/master/src/main/java/org/apache/commons/codec/binary/Base64.java#L154]
 that merges both codings. If properly done, there should be two separate 
methods available for each decoding and verifying base64 validity content 
functions.

I see two ways to implement the feature:
 * Modify behavior within the current _Base64.isBase64_ and 
_Base64.decodeBase64_ methods, to verify the content against the "standard 
base64" specification only, instead of a merged one; and create two new methods 
such as _Base64.isBase64url_ and _Base64.decodeBase64url_ that will verify the 
content against the Base64Url coding specification. This modification will 
introduce some breaking changes in the API, as users relying on a url-safe 
processing with the old methods will need to adapt the code; or
 * Keep existing methods as they are, containing a "merged" behavior, and 
introduce new methods for each coding separately, such as 
_Base64.isStandardBase64_ and _Base64.decodeStandardBase64_ and 
_Base64.isBase64url_ and {_}Base64.decodeBase64url{_}. This will avoid changes 
in the existing API, but will provide new methods for those who want to ensure 
a strict behavior. The only thing I'm not sure about is what behavior to apply 
on [Builder#setUrlSafe(final boolean 
urlSafe)|https://github.com/apache/commons-codec/blob/master/src/main/java/org/apache/commons/codec/binary/Base64.java#L97]
 method call, as we will basically have three different options.

If you may share your thoughts on the best implementation for your library, I 
can provide you with a PR later when I have some spare time.

 

KR,

Aleksandr

> Separate Base64 decoding to "real" base64 and base64Url
> -------------------------------------------------------
>
>                 Key: CODEC-333
>                 URL: https://issues.apache.org/jira/browse/CODEC-333
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.18.0
>            Reporter: Aleksandr Beliakov
>            Priority: Minor
>
> The behavior applied in CODEC-75 does not distinguish between Base64 and 
> Base64Url implementations within 
> [isBase64|https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isBase64(java.lang.String)]
>  and 
> [decodeBase64|https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#decodeBase64(java.lang.String)]
>  methods. Nor, it does change the behavior, even if instantiating a class 
> object using a Builder like below, explicitly stating that url-safe 
> processing should not be used:
> {code:java}
> Base64 base64 = Base64.builder().setUrlSafe(false).get(); {code}
> with the method 
> [isInAlphabet|https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isInAlphabet(byte)]
>  still using the base64Url-safe decoding table.
>  
> This causes some issues in the implementation, for instance inability to 
> distinguish between Base64 and PEM encoded certificates, with Apache Codec 
> accepting the latest as Base64. See a (chunked and dummy) example below (with 
> _Base64.isBase64()_ method returning true):
> {code:java}
> -----BEGIN CERTIFICATE-----
> AQIDBAU=
> -----END CERTIFICATE-----{code}
>  
> According to the [RFC 4648|https://datatracker.ietf.org/doc/html/rfc4648], 
> handling of Base64 and Base64Url together should not be used:
> {panel:title=5. Base 64 Encoding with URL and Filename Safe Alphabet}
> This encoding may be referred to as "base64url".  *This encoding should not 
> be regarded as the same as the "base64" encoding* and should not be referred 
> to as only "base64".  Unless clarified otherwise, "base64" refers to the base 
> 64 in the previous section.
> {panel}
> While same handling may be acceptable in some particular cases, we think a 
> generic library like Apache Commons Codes shall make a clear separation 
> between the two encodings and unless implementors explicitly state the 
> support of url-safe characters, it should not be applied by default.
>  
> This should also avoid potential errors within the implementations, when a 
> system shall treat base64 and base64url encodings as two separate ones (for 
> instance like in 
> [standards|https://www.etsi.org/deliver/etsi_ts/119100_119199/11918201/01.02.01_60/ts_11918201v010201p.pdf]
>  explicitly requiring one or another implementation, based on the usage).
>  
> _FYI, Google Guava have two separate implementations 
> [BaseEncoding.base64()|https://guava.dev/releases/17.0/api/docs/com/google/common/io/BaseEncoding.html#base64()]
>  and 
> [BaseEncoding.base64Url()|https://guava.dev/releases/17.0/api/docs/com/google/common/io/BaseEncoding.html#base64Url()]
>  allowing implementors to explicitly choose one according to the project 
> needs._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to