On 19 March 2013 18:27, Simone Tripodi <simonetrip...@apache.org> wrote: > Hi there, > >>> is there anything you want to discuss before I go ahead with a new RC? >>> I am performing some ITs in non-public projects and I planned to cut a >>> new RC in 1-2 days... >> >> I think there are still some issues in Base64Decoder (this time not >> ones I accidentally introduced - sorry about that!). >> > > don't worry, it can happen - unit tests saved us :) > >> >> For example, it does not seem to handle whitespace properly - >> certainly not in the last 4 bytes. >> Nor does it detect invalid characters in input - AFAICT these will be >> treated as binary (because the decoding table is initialised to 0). >> > > I am honestly not worried about white spaces, since those decoders are > not part of APIs and are used by the MimeUtility which tokenizes > strings by whitespaces > >> With the new version of the code, skipped characters can cause array >> bounds errors after the main loop, because finish is calculated >> without taking skipped characters into account, thus the loop can >> consume additional bytes. In the original version, this would not >> cause array bounds errors, but the same byte could be used more than >> once. >> > > that is worrying > >> I think it really needs to be rewritten, which I am happy to do. >> I intend to fetch the bytes, skipping as necessary otherwise storing >> them in an array. When there are 3 bytes available, output them and >> resume. >> At end of input, output any remaining bytes. >> >> It remains to be decided how to handle '=' before the end of the byte >> data - does that signal end of input, or is it an error if '=' appears >> other than at the end? >> Indeed, should it handle whitespace at all, and if so are there >> restrictions where that may appear? >> >> QPD has problems too: it assumes that characters following '=' are >> hex, but does not validate this. >> It also fails to handle lowercase hex, which is not very tolerant. >> Also the exception messages aren't specific enough - cannot >> distinguish truncated encoded sequence from invalid CRLF pair. >> Again, I can take this on. >> > > I won't appose to a rewrite, but at that point, since both decoders > are buggy: what about adding a dependency to commons-coded? It already > handles both Base64 and QuotedPrintable codecs. > We could shading the interested classes, if we want to avoid less > dependencies as possible, but a dependency to another commons > component sounds fair to me.
Well, codec is about 260 kb, whereas fileupload is 70 kb. If fileupload is ever used on a device with limited memory (smartphone) that would be a huge disadvantage, unless the application happened to use codec already. >> Given that MimeUtil seems to strip spaces on input, do the decoders >> need to handle embedded spaces? > > not in my opinion, as I wrote before. OK, removing that support would fix the end of buffer bugs and simplify the code. > Thanks and best, > -Simo > > http://people.apache.org/~simonetripodi/ > http://simonetripodi.livejournal.com/ > http://twitter.com/simonetripodi > http://www.99soft.org/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org