Howdy,
One of the "problems" in recursive-descent parsing is constructs that
look a lot alike at the front, only to differ at the end -- a sort of
end-weight pathology. The example I'm thinking of is the similarity
between variable and function declarations in 'C'.
extern int foo = 0;
extern int bar(int foo, int mumble) ;
static int zip(int foo, char * beezle) { return 0; }
I'm dealing with a similar problem in Close (my pet language), and I've
tweaked my grammar to favor NOT having to rescan. This means I've got a
single declaration rule, but it also means a bunch of <?DECL_MODE_....>
predicates, and recognizing ';' at the end of the declaration (except
when in parameter mode), etc.
It occurs to me that a win, of sorts, would be for the "declaration"
rule to be able to return multiple alternative values - a sort of
"result polymorphism".
Thus:
rule extern_decl {
| <variable_decl> ';'
| <function_decl> ';'
| <function_defn>
}
could "magically" avoid calling three different rules that scanned
mostly the same tokens by some sleight of hand:
rule variable_decl is parsed(&super_duper_decl);
rule function_decl is parsed(&super_duper_decl);
rule function_defn is parsed(&super_duper_decl);
But how to indicate which result you are returning?
Alternatively, it may be the case that some kind of intermediate-level
results memoization would do the trick. If the <storage_class> rule
memoized its result for offset X, it could generally do the right thing,
except that it would have to know when not to memoize (as when some
idiot puts in a <?DECL_MODE...> state flag). So maybe
rule storage_class is memoized {...}
is the right thing?
Anyway, I'm not proposing anything so much as wondering out loud. Surely
there's a bunch of smarter bears than me who have given this some
thought. Any wisdom?
=Austin