Hey,

I'm thinking of ways to improve the readability of CSVLexer. I think
that it might be easier to improve performance if the code is easier
to understand. Here is, what I think can be improved:

1. eliminate Token input parameter on nextToken()
To me it looks like the token input parameter on nextToken() has the
purpose of sparing object creation. How about a private field
'currentToken' that can be reused. No method parameters are better
than one method parameter :)

2. add additional convenience methods
Right now we have some methods for char handling like isEndOfFile(c).
There are some methods missing like isDelimiter(c) or
isEncapsulator(c). There is not much to say about this. I just think
that isDelimiter(c) is slightly easier to understand than c ==
format.getDelimiter().

3. eliminate input parameter c on readEscape (and rename it ?)
Right now we have to pass an int to readEscape, but the method does
not use that parameter. So why do we keep it? Also the method does not
really "read" an escape. It assumes, that is is called after a "/" and
then returns the delimiter for a letter.

4. Get rid of those nasty while(true) loops!
There are several while true loops. It is really hard to see what is
going on, because you can not exactly see when a loop ends. The worst
example for this is encapsulatedTokenLexer. It has an outer
while(true) loop with a nested inner loop, that may return a token,
terminating both loops.
I've tried to eliminate those while true loops, but without success.

If no one objects, I'd like to create patches for 1. & 2. I leave 3.
and 4. for discussion...

Regards,
Benedikt

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to