Hi there,

>> is there anything you want to discuss before I go ahead with a new RC?
>> I am performing some ITs in non-public projects and I planned to cut a
>> new RC in 1-2 days...
>
> I think there are still some issues in Base64Decoder (this time not
> ones I accidentally introduced - sorry about that!).
>

don't worry, it can happen - unit tests saved us :)

>
> For example, it does not seem to handle whitespace properly -
> certainly not in the last 4 bytes.
> Nor does it detect invalid characters in input - AFAICT these will be
> treated as binary (because the decoding table is initialised to 0).
>

I am honestly not worried about white spaces, since those decoders are
not part of APIs and are used by the MimeUtility which tokenizes
strings by whitespaces

> With the new  version of the code, skipped characters can cause array
> bounds errors after the main loop, because finish is calculated
> without taking skipped characters into account, thus the loop can
> consume additional bytes. In the original version, this would not
> cause array bounds errors, but the same byte could be used more than
> once.
>

that is worrying

> I think it really needs to be rewritten, which I am happy to do.
> I intend to fetch the bytes, skipping as necessary otherwise storing
> them in an array. When there are 3 bytes available, output them and
> resume.
> At end of input, output any remaining bytes.
>
> It remains to be decided how to handle '=' before the end of the byte
> data - does that signal end of input, or is it an error if '=' appears
> other than at the end?
> Indeed, should it handle whitespace at all, and if so are there
> restrictions where that may appear?
>
> QPD has problems too: it assumes that characters following '=' are
> hex, but does not validate this.
> It also fails to handle lowercase hex, which is not very tolerant.
> Also the exception messages aren't specific enough - cannot
> distinguish truncated encoded sequence from invalid CRLF pair.
> Again, I can take this on.
>

I won't appose to a rewrite, but at that point, since both decoders
are buggy: what about adding a dependency to commons-coded? It already
handles both Base64 and QuotedPrintable codecs.
We could shading the interested classes, if we want to avoid less
dependencies as possible, but a dependency to another commons
component sounds fair to me.

> Given that MimeUtil seems to strip spaces on input, do the decoders
> need to handle embedded spaces?

not in my opinion, as I wrote before.

Thanks and best,
-Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to