On Sat, Sep 22, 2012 at 6:13 AM, David Kastrup <d...@gnu.org> wrote: > Janek Warchoł <janek.lilyp...@gmail.com> writes: >> After that, the parser's job is to group these 'words' into meaningful >> 'sentences'. For example, >> c4 g \f d8-. >> becomes >> c4 >> g \f >> d8-. >> (i.e., all things that go with the pitch - a duration, articulations >> etc - are merged together). > > The "merging" is hierarchical: - and . are merged to -., d and 8 are > merged to d8, then d8 and -. are merged to d8-. and so on. In fact, the > whole input is finally merged into "start_symbol", and then the parser > is done.
Indeed, that makes more sense than what i wrote :) >> The problem is that sometimes it's impossible to tell what something >> is without looking at next thing. For example, when reading this >> \markup " \ bla" >> letter-by-letter, Lily sees >> \ <= a beginning of a command >> m <= first letter of the command name >> a <= second letter of the command name >> r etc. >> k >> u >> p >> <= whitespace - this means command name ended >> " <= beginning of a string >> <= space in the string >> \ <= another character in the string >> >> b >> l >> a >> " <= end of the string. >> >> That was easy. Now, take this: > > Bad example: as far as the _parser_ is concerned, a string is just a > single entity. well, i thought that lexer uses lookahead too - my intent was to show what happens when lexer processes " \ bla". I guess i mixed parser and lexer in this example. > That's one reason quoted strings can contain spaces: the > lexer mever passes them as a _kind_ of token by themselves, but it _can_ > pass them inside of the _value_ of a token of kind STRING. Hmm. Despite the fact that things don't happen in lily the way i've shown, did i give a good example of the idea of lookahead? > Lookahead is needed in > cases like detecting the end of a music event. A music event can be all > of the following: > > c c'' c''8 c''8-. c''8-.-^ c > > How do we recognize when the music event ends? By taking a look at the > _next_ token and seeing whether we can make it part of the current music > event. So the decision what the current music event is depends on what > appears next in the input. Ah, that's indeed a better example. (James, is it clear?) > Usually, something like { .... } does not require lookahead to form > units since there is a closing delimiter. Unfortunately, { ... } is not > a complete unit until we haven't checked that no \addlyrics is trailing > it, which _still_ can become part of the expression. Indeed, this kind of defeats the purpose of using something called a *closing delimiter*. >> Lookahead means that before deciding what current letter in input >> means, we look at the next one. > > Not "letter", but "token". ok >> So, everytime Lily sees a backslash inside a string (inside " "), she >> looks at the next letter in input to know whether the backslash is >> just another char or has a special meaning. > > The lexer does not really work with "lookahead" as a rule: it can make > more complex decisions (we take some pains to avoid this "backing up" > for performance reasons, but it is not an inherent restriction). ah, i guess this answers my previous question. >> I'm not sure what lexer modes are, but i suppose that it's about >> different rules in different contexts. For example, when you're >> inside a string you have to do a lookahead when you encounter a >> backslash, but you don't have to do this when you're not inside >> string. > > Strings are internal to the lexer: the parser never gets to see or > influence string start and end. There are other modes like lyricmode, > markupmode, musicmode, chordmode and so on in which the tokens are being > formed according to different rules. Ah, so i mixed parser and lexer again. But the "different rules in different contexts" part holds true :) >>> vI = \relative c'' { \clef "treble" \repeat unfold 40 g4 } >>> \addQuote vIQuote { \vI } >> >> LilyPond says "i don't know what a \vl is. \vl looks like a string, >> and i don't want a string here" > > No, it does not look like a string. The lexer sees \vl, recognizes it > as a command and looks up its meaning. It has no meaning, so it > complains, and to pass anything at all to the parser, it passes the > thing as a STRING to the parser, in the hope that this backslash might > just have been part of something intended as a word. It wasn't, and so > the parser is the next one to complain that it has no idea what to do > with a STRING in this context. Ah, ok. Anyway, the point is that Lily doesn't know what \vl is yet. >>> Huh? Why is \vI undefined at the time \addQuote is called? Now since >>> \addQuote is called in the lexer in this LilyPond version, >> >> David's experimental change resulted in \addQuote being called and >> "calculated" during lexer phase. This didn't happen before. > > That is not the actual problem. The problem is that it is being called > while the assignment has not yet been completed. Oh, yes. I didn't mean to say that "\addQuote being called and "calculated" during lexer phase is the actual problem". I only stated what is the difference in behaviour. > Previously, music > functions are calculated in the parser, so the parser would have looked > at the next token MUSIC_FUNCTION (for \addQuote) and would have decided > that it does not match ADDLYRICS, then it would have completed the > assignment with MUSIC_FUNCTION as the lookahead, and only _then_ would > have continued with the following music expression. aha. >>> [snip long quote] I don't know why you didn't delete this quote. please remove long quotes that you don't directly reply to. >> I'm not sure about mode-switching commands. >> But generally, having to do excessive lookahead is bad. You prefer to >> know what's happening without looking ahead. > > Well, our syntax can't get along without lookahead. But we have > different modes, like lyrics mode, music mode, markup mode etc in which > tokens are recognized differently. It is the parser's job to switch > between those modes, and if it does this decision based on lookahead, > the lookahead is still recognized in the previous mode and can't be > reinterpreted in its "proper" mode. so, "avoid lookahead if possible, especially when modes change.", right? > This is actually the reason I recently made recognition of commands and > strings the same in the various modes: previously line-width was a > single lexical unit in INITIAL mode (which is used inside of context > definitions and output definitions), but was three units, line - width > in most other modes. Now if you had music interspersed in INITIAL mode, > this might have looked like > { ... } line-width = ... > and since } needed a lookahead token to be complete, the lookahead > token, still scanned in music mode, would have been just line, and there > would have been no way to get to the single STRING line-width later. that was really messy then. Good that you've fixed it. thanks for the explanations, Janek _______________________________________________ lilypond-devel mailing list lilypond-devel@gnu.org https://lists.gnu.org/mailman/listinfo/lilypond-devel