Collecting statistics after parsing

2021-05-04 Thread Maury Markowitz
Before I reinvent the wheel: I'm wondering if anyone has some boilerplate code 
for printing out the frequency of tokens found in the source?

Right now I'm counting out specific tokens, like constants for zero and one, 
strings, etc. This is with explicit code in the bison side that populate some 
extern variables. For instance, I have a simple pattern in my flex to find 
numeric constants, and then on the bison code I do:

factor:
  NUMBER
{
  numeric_constants_total++;
  if (floorf(num) == num) {
numeric_constants_int++;
  } else {
numeric_constants_float++;
  }

It would seem that this would all be much easier to do by post-processing the 
tree? That would isolate the code in a single area, avoid polluting the bison 
source, and remove the need for the externs (main already knows about yy). The 
downside would be that I'd likely have to break up the tokens? One for integers 
and another for floats, etc.


Re: Collecting statistics after parsing

2021-05-04 Thread Hans Åberg


> On 4 May 2021, at 20:09, Maury Markowitz  wrote:
> 
> Before I reinvent the wheel: I'm wondering if anyone has some boilerplate 
> code for printing out the frequency of tokens found in the source?

The Bison parser calls yylex when getting new tokens, so you might it make it.




Re: Collecting statistics after parsing

2021-05-04 Thread Adrian Vogelsgesang via Users list for the GNU Bison parser generator
Hi Maury,

Another potential approach: Hook the lexer function.
Instead of embedding the counting directly into the bison grammar, you could do 
the counting directly in the `yylex` function, as a wrapper around your actual 
lexer.

That way, you can separate the “structural concerns” of your language (encoded 
by the bison grammar) from the “tokenization concerns” (such as this token 
counting).

Cheers,
Adrian

From: help-bison  on 
behalf of Maury Markowitz 
Date: Tuesday, 4. May 2021 at 20:09
To: Bison Help 
Subject: Collecting statistics after parsing
Before I reinvent the wheel: I'm wondering if anyone has some boilerplate code 
for printing out the frequency of tokens found in the source?

Right now I'm counting out specific tokens, like constants for zero and one, 
strings, etc. This is with explicit code in the bison side that populate some 
extern variables. For instance, I have a simple pattern in my flex to find 
numeric constants, and then on the bison code I do:

factor:
  NUMBER
{
  numeric_constants_total++;
  if (floorf(num) == num) {
numeric_constants_int++;
  } else {
numeric_constants_float++;
  }

It would seem that this would all be much easier to do by post-processing the 
tree? That would isolate the code in a single area, avoid polluting the bison 
source, and remove the need for the externs (main already knows about yy). The 
downside would be that I'd likely have to break up the tokens? One for integers 
and another for floats, etc.