Reviewers: lemzwerg,
Message:
On 2011/08/15 18:14:21, lemzwerg wrote:
Could you please tell me what this patch is good for? A BOM not at
the
beginning of a file is no longer a BOM...
I don't oppose to emitting a warning if U+FEFF is encountered, and we
subsequently ignore it (since its use as zero width no-break space is
deprecated), but only within strings...
What am I missing?
RFC 3629 says that U+FEFF is a zero-width non-breakable space, which is
also used as BOM. It also says:
" This character
can be used as a genuine "ZERO WIDTH NO-BREAK SPACE" within text,"
...
" It is important to understand that the character U+FEFF appearing at
any position other than the beginning of a stream MUST be interpreted
with the semantics for the zero-width non-breaking space, and MUST
NOT be interpreted as a signature."
Also, our lilypond files are text, so I would understand this that we
should treat the U+FEFF inside the file contents as normal whitespace.
Description:
Issue 905: Gracefully ignore UTF-8 BOM in the middle of a file
Please review this at http://codereview.appspot.com/4908043/
Affected files:
A input/regression/bom-mark.ly
M lily/include/lily-lexer.hh
M lily/lexer.ll
M lily/lily-lexer.cc
Index: input/regression/bom-mark.ly
diff --git a/input/regression/bom-mark.ly b/input/regression/bom-mark.ly
new file mode 100644
index
0000000000000000000000000000000000000000..19895a5af8151d00f7656ea5e51df0d214cd5b5d
--- /dev/null
+++ b/input/regression/bom-mark.ly
@@ -0,0 +1,11 @@
+ \version "2.15.9"
+
+#(ly:set-option 'warning-as-error #f)
+
+\header {
+ texidoc = "This input file contains a UTF-8 BOM not at the very
beginning,
+ but on the first line after the first byte. LilyPond should gracefully
+ ignore this BOM as specified in RFC 3629, but print a warning."
+}
+
+{ c }
Index: lily/include/lily-lexer.hh
diff --git a/lily/include/lily-lexer.hh b/lily/include/lily-lexer.hh
index
72391a087748cdd676739a8ed2b3646547f077c7..9729ca701664d8cbaa28277408e62c6cc1e434aa
100644
--- a/lily/include/lily-lexer.hh
+++ b/lily/include/lily-lexer.hh
@@ -110,6 +110,7 @@ public:
void push_note_state (SCM tab);
void pop_state ();
void LexerError (char const *);
+ void LexerWarning (char const *);
void set_identifier (SCM path, SCM val);
int get_state () const;
bool is_note_state () const;
Index: lily/lexer.ll
diff --git a/lily/lexer.ll b/lily/lexer.ll
index
7cda144e263c9720868330a988904f7fd45dee89..9cb706ebdcaf2f04f4ef32526779aa636d597da1
100644
--- a/lily/lexer.ll
+++ b/lily/lexer.ll
@@ -189,8 +189,8 @@ BOM_UTF8 \357\273\277
<INITIAL,chords,lyrics,figures,notes>{BOM_UTF8}/.* {
if (this->lexloc_->line_number () != 1 || this->lexloc_->column_number
() != 0)
{
- LexerError (_ ("stray UTF-8 BOM encountered").c_str ());
- exit (1);
+ LexerWarning (_ ("stray UTF-8 BOM encountered").c_str ());
+ // exit (1);
}
debug_output (_ ("Skipping UTF-8 BOM"));
}
Index: lily/lily-lexer.cc
diff --git a/lily/lily-lexer.cc b/lily/lily-lexer.cc
index
5d87c83872d25052496f800de539760a71264c69..ba6429c3ea2798344702178363f200071c0f73cc
100644
--- a/lily/lily-lexer.cc
+++ b/lily/lily-lexer.cc
@@ -310,7 +310,7 @@ void
Lily_lexer::LexerError (char const *s)
{
if (include_stack_.empty ())
- message (_f ("error at EOF: %s", s) + "\n");
+ non_fatal_error (s, _f ("%s:EOF", s));
else
{
error_level_ |= 1;
@@ -319,6 +319,18 @@ Lily_lexer::LexerError (char const *s)
}
}
+void
+Lily_lexer::LexerWarning (char const *s)
+{
+ if (include_stack_.empty ())
+ warning (s, _f ("%s:EOF", s));
+ else
+ {
+ Input spot (*lexloc_);
+ spot.warning (s);
+ }
+}
+
char
Lily_lexer::escaped_char (char c) const
{
_______________________________________________
lilypond-devel mailing list
lilypond-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-devel