Hey, I'm thinking of ways to improve the readability of CSVLexer. I think that it might be easier to improve performance if the code is easier to understand. Here is, what I think can be improved:
1. eliminate Token input parameter on nextToken() To me it looks like the token input parameter on nextToken() has the purpose of sparing object creation. How about a private field 'currentToken' that can be reused. No method parameters are better than one method parameter :) 2. add additional convenience methods Right now we have some methods for char handling like isEndOfFile(c). There are some methods missing like isDelimiter(c) or isEncapsulator(c). There is not much to say about this. I just think that isDelimiter(c) is slightly easier to understand than c == format.getDelimiter(). 3. eliminate input parameter c on readEscape (and rename it ?) Right now we have to pass an int to readEscape, but the method does not use that parameter. So why do we keep it? Also the method does not really "read" an escape. It assumes, that is is called after a "/" and then returns the delimiter for a letter. 4. Get rid of those nasty while(true) loops! There are several while true loops. It is really hard to see what is going on, because you can not exactly see when a loop ends. The worst example for this is encapsulatedTokenLexer. It has an outer while(true) loop with a nested inner loop, that may return a token, terminating both loops. I've tried to eliminate those while true loops, but without success. If no one objects, I'd like to create patches for 1. & 2. I leave 3. and 4. for discussion... Regards, Benedikt --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org