Hi there, >> is there anything you want to discuss before I go ahead with a new RC? >> I am performing some ITs in non-public projects and I planned to cut a >> new RC in 1-2 days... > > I think there are still some issues in Base64Decoder (this time not > ones I accidentally introduced - sorry about that!). >
don't worry, it can happen - unit tests saved us :) > > For example, it does not seem to handle whitespace properly - > certainly not in the last 4 bytes. > Nor does it detect invalid characters in input - AFAICT these will be > treated as binary (because the decoding table is initialised to 0). > I am honestly not worried about white spaces, since those decoders are not part of APIs and are used by the MimeUtility which tokenizes strings by whitespaces > With the new version of the code, skipped characters can cause array > bounds errors after the main loop, because finish is calculated > without taking skipped characters into account, thus the loop can > consume additional bytes. In the original version, this would not > cause array bounds errors, but the same byte could be used more than > once. > that is worrying > I think it really needs to be rewritten, which I am happy to do. > I intend to fetch the bytes, skipping as necessary otherwise storing > them in an array. When there are 3 bytes available, output them and > resume. > At end of input, output any remaining bytes. > > It remains to be decided how to handle '=' before the end of the byte > data - does that signal end of input, or is it an error if '=' appears > other than at the end? > Indeed, should it handle whitespace at all, and if so are there > restrictions where that may appear? > > QPD has problems too: it assumes that characters following '=' are > hex, but does not validate this. > It also fails to handle lowercase hex, which is not very tolerant. > Also the exception messages aren't specific enough - cannot > distinguish truncated encoded sequence from invalid CRLF pair. > Again, I can take this on. > I won't appose to a rewrite, but at that point, since both decoders are buggy: what about adding a dependency to commons-coded? It already handles both Base64 and QuotedPrintable codecs. We could shading the interested classes, if we want to avoid less dependencies as possible, but a dependency to another commons component sounds fair to me. > Given that MimeUtil seems to strip spaces on input, do the decoders > need to handle embedded spaces? not in my opinion, as I wrote before. Thanks and best, -Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/ --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org