John Machin schrieb:

[...] You have TWO problems: (1) Reporting the error location as
(offset from the start of the file) instead of (line number, column
position) would get you an express induction into the User Interface
Hall of Shame.

Of course. For the actual message I would use at least the line number. Still, the offset could be used to compute line/column in case of an error, so I wouldn't really need to store line/column with each token, but only the offset. And provide a method to "convert" offset values into line/column tuples.

(2) In the case of a file with lines terminated by \r
\n, the offset is ambiguous.

If I explicitly state that the offset counts newlines as one character? But you're right: the offset would be for internal use only - what gets reported is line/column.

dict.iter<anything>() will return its results in essentially random
order.

A list of somethings does seem indicated.

On the other hand: If all my tokens are "mutually exclusive" then, in theory, the order in which they are tried, should not matter, as at most one token could match at any given offset. Still, having the most frequent tokens being tried first should improve performance.

A dict is a hashtable, intended to provide a mapping from keys to
values. It's not intended to have order. In any case your code doesn't
use the dict as a mapping.

I map token names to regular expressions. Isn't that a mapping?

return "\n".join(
    [ "[L:%s]\t[O:%s]\t[%s]\t'%s'" %

The first 3 are %s, the last one is '%s'

I only put the single quotes so I could better "see" whitespace in the output. Anyway, this method is just to be able to check if the lexer does what it's supposed to do -- in the final version I will probably get rid of it.

Thanks & greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to