Re: Flex or bison?

Hans Aberg Sat, 18 Sep 2010 01:34:24 -0700

On 18 Sep 2010, at 09:15, Hans Lodder wrote:

I am currently performing a Seach Engine Optimization (SEO) of HTMLweb-pages of my web-site (on Win XP Home SP3). In order to do thatit is important to know, which 3 words are used most frequently onthe page. So I wrote a cross referencer (in C) to find those. The2nd step is find the 3 most frequently used word groups, consistingof 2 words. The results of both should be combined.
Now I have several possibilities. It is easy to do this in C aswell. Alternatives are using flex, or the combination of flex andbison.
To have Flex identify a word is easy:

[-0-9A-Za-z]+

So is the identification of 2 words:

[-0-9A-Za-z](' '|\t)[-0-9A-Za-z]
The easiest way to implement this is to write 2 programs, andmanually combine the result.
Now my question is: Can both be combined in 1 Flex, or Flex andBison program. Flex will try to satisfy the longest match, so itwill not find the single word. Does this imply that I shouldintroduce some functionality like a 'Moving Average Filter'? Arethere better solutions?

In a common Flex/Bison setup, the lexer finds the identifiers whichare handed over to the parser, though one may do it otherwise, wouldneed arise. So translated to your case, the Flex generated lexer wouldfind the words handed over to the Bison generated parser words.

But you might want be able to identify word sequences with overlap,like in a sequence of words w_1, w_2, w_3, ..., finding both w_1 w_2and w_2 w_3. The parser/lexer combination consumes the input withoutbacktracking, so you need to do this in the code of the actions.

Since the parsing is very simple, you might be better off with doingit all in a high level language, for example Haskell (using Hugs andGHC/GHCi) or perhaps Perl, cutting development time.



_______________________________________________
[email protected] http://lists.gnu.org/mailman/listinfo/help-bison

Re: Flex or bison?

Reply via email to