Re: lexer.ll: Warn about non-UTF-8 characters (issue 5505090)

dak Sun, 01 Jan 2012 01:40:47 -0800

Reviewers: lemzwerg, Keith, carl.d.sorensen_gmail.com,


Message:
On 2012/01/01 02:01:11, Keith wrote:

Works nicely.

Showing the input location will probably be very helpful.  We probably

want to

remove the similar message from lily/misc.cc, because both message

together are

very noisy.


Depends on what the message does.  This patch checks exclusively the
input to the lexer/parser.  There are ways of generating strings for the
backend programmatically, however.  I have decided to check strings,
comments and file names here as well.  This means that if you use
literal strings as binary containers or have to encode file names in
non-utf-8 because of other deficiencies in Lilypond, you'll get
complaints.

I wish I could think of a way to check the input with a canned regular
expression like
<http://flex.sourceforge.net/manual/Identifiers.html#Identifiers> or

better one

with comments

<http://www.w3.org/International/questions/qa-forms-utf-8>

Doing so seems to require backing up (which probably won't cause any

harm) or

maybe I'm just not seeing an easy way.


Our lexer has been written with the decision of using non-compressed
tables and without backing up.  I spent more than a day's worth on doing
utf-8 right in the grammar.  That's pretty pointless.  It also means
that we need to provide an error path for every item containing
non-UTF-8 characters in order to get a UTF-8 related error message
instead of something more mysterious.

So I don't think it is really worth the trouble.


Description:
lexer.ll: Warn about non-UTF-8 characters

Making the warnings point to the exact bad byte rather than the
enclosing construct would be nice.

Please review this at http://codereview.appspot.com/5505090/

Affected files:
  M lily/include/lily-lexer.hh
  M lily/lexer.ll



_______________________________________________
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel

Re: lexer.ll: Warn about non-UTF-8 characters (issue 5505090)

Reply via email to