On Mon, 05 Jan 2009, Mindaugas Kavaliauskas wrote:

Hi Mindaugas,

> After last changes comparison table is (no -ko flags used):
> No Test code               Harbour  PHP   Python Ruby   Java  Java+JIT
> ----------------------------------+------+------+------+------+-------+
>    10'000'000 times: 
> 11 nJ+=nI*2+17               4.61   3.41   6.54  25.53   1.36    0.08
> 12 nJ+=17+1+...+1+nI*2       4.61   6.10   6.62  45.86   1.36    0.08
> 13 nJ+=nI*2+17+1+...+1      10.86   5.95  12.96  45.78   1.92    0.17
>    200'000 times:
> 21 cI:=cI+"a"                0.08   6.70   0.11  25.80 1641.766 1650.437
> 22 cI+="a"                   0.08   0.07   0.12  25.83 1649.344 1654.735
> 31 AADD(aI,{nI})             0.16   0.29   0.32   0.20   0.76    0.40
> 32 AADD(aI,{aI[nI-1][1]+1})  0.25   0.34   0.38   0.39   0.92    0.40
> -----------------------------------------------------------------------
> So, we need to work on typed integer variable optimisation to be perfect :)

The -ko compiler switch will decrease the cost of 13 test above to 12
level. Why you didn't use it?

> Since, I do not know the compiler internals very well, I've started from 
> bison rules. Rewriting it to lemon parser was a good way to know internals 
> better. It took me more than a day of rewriting rules. But all this job 
> requires some changes in C code also. So, I'm not sure if these lemon rules 
> are useful at all, but the goal was to understand internals better. The 
> main differences (bison vs. lemon), I found:

I'm really under express of your job.

> 1) non-terminal symbol should be lowercase
> 2) terminal symbol can not be character. All ',', '(', ')' should be 
> changed to some define, like C_COMMA, C_LBRACKET, C_RBRACKET, etc. And 
> complex code has to be fixed to return these constants.

Mayby we can try to use common symbol values for PP and lexer to reduce
number of translations anyhow even adding some new one to existing code
is not a problem.

> 3) symbol : symbola symbolb symbolc
> has to be converted to
>    symbol ::= symbola symbolb symbolc
> 4) $$, $1, $2, ...
> has to be converted to explicitly defined X, A, B, ...
>    symbol(X) ::= symbola(A) symbolb(B)

what greatly increase readability and eliminates possible typos when
grammar rules have to be changed and indexes are not correctly updated
what is quite often problem with bison.

> 5) lemon passes the whole structure, not a union member as bison do. So:
> "%type <string> IdentName" and "$1", should be changed to:
> "%type identname {HB_EXPR_PTR}" and "A->string".

It's rather result of used syntax not low level constructions used on
grammar stack and here lemon rules are also more readable.

> 6) lemon does not allow multiple C code for single rule. You can specify a 
> single C code an the end of rule. So, you can not write rules:
> DefTwoParam : ParamOne AsType
>               { hb_compVariableAdd($1, HB_COMP_PARAM->cVarType); }
>               ',' ParamTwo AsType
>               { hb_compVariableAdd($5, HB_COMP_PARAM->cVarType); }
> This should be fixed by adding value to AsType non-terminal symbol instead 
> of "global" compiler variable cVarType.
> AsType : AS_NUMERIC { $$ = 'N'; }
> ...
> DefTwoParam : ParamOne AsType ',' ParamTwo AsType
>               { hb_compVariableAdd($1, $2);
>                 hb_compVariableAdd($4, $5); }

In theory is limitation but in practice it forces much more cleaner
rules because it's not possible to easy use such global hack like in
bison. It makes later code updating much easier. Specially when many
people have to work common project. Any global variables used in
bison grammar rules sooner or later creates some problems. For me
it's very good design decision. It also helps to create MT and reentrant
safe code.

> The problem is that AsType keeps its state in two global variables: 
> cVarType and szFromClass, so we can not store type in a simple integer 
> value and need to define a structure, collect these structures in some 
> list, etc. There are a few more global variables, like: iVarScope, etc.

Yes it is and even if we will want to still with bison then it will
be good to eliminate them. Though it does not mean that we will be
able to fully eliminate all global variables. In some places they
can be very usable and in some other they will be strictly necessary
f.e. stack index updated by statements which allocates stack items
for temporary variables and much easier and faster WITH OBJECT
implementation (technically WITH OBJECT is only unnamed temporary
variable).
There is also problem with syntax error reporting. In Clipper
END, ENDIF, ENDDO, NEXT, ENDCASE always closes the last statement
even if it's not valid clause and only error is reported, f.e.:

   FOR i:=1 to 2
      IF x
         ? "x"
      NEXT  // here error is reported and NEXT works like ENDIF
            // you can confim it if you put ENDIF in THIS line
   NEXT

Compiling this code generate in Clipper only one error. To reach
such effect we can add all closing statements to each defined statement
but the ones which are not valid will generate RT errors.
We can also use one global list common for all type of statemants and
do not define full grammar rules for them but only separated for simple
IF ..., FOR ..., WHILE ..., ... which will push new statement and
END, ENDIF, ENDDO, NEXT, ... which will pop last statement generating
error if necessary. ELSE, ELSEIF, CASE, ... will operate on last
statement in the list. Such list greatly simplify GOTO instructions
when some statements allocate VM stack. In our case LOOP and EXIT.
It's very easy to check if they can be used in some nested statements
and generate RT error or pop valid number of stack items allocated
by statements which are interrupted/exited. This version is usually
much easier to update when you want to extend language and introduce
some new constructions. Please also note that it allows to eliminate
a lot of { intermediate C code }.

> Sometimes a simple solution exists, definition of new non-terminal symbols, 
> i.e.
> DefTwoParam : ParamOne AsType DefTwoParam1 ',' ParamTwo AsType
>               { /* end C code */ }
> DefTwoParam1 : /*empty */ { /* middle C code */ }
> but the problem is that ParamOne is not accessible in "middle C code".
> 7) Bison allows access symbols not from current rule. Actually ParamOne 
> could be reached by $-1 from "middle C code", but these hacky solutions are 
> not allowed in lemon. There are 4 places containing hacky code in current 
> bison rules.

And should be eliminated. All negative indexes in grammar rules are
potentially source of problems.

> Now I would like more a way to implement parser rules rewriter instead of 
> rewriting rules by hand. Only some places (like $0) will need to be fixed 
> by hand.

I think that also some others, i.e. see above. Even if it's not strictly
necessary then it will make code cleaner, simpler and easier to update or
extend.

> ====================================================
> The problem with all these rule changes are:
> 1) lemon finds 39 collisions, but I'm unable to find a reason. (I guess 
> there 3 or 4 collisions, but symbols are used a few times, so it generate a 
> lot collisions).
> 2) I'm unable to do compiler changes in a small steps. So, I've ruined 
> compiler code without many chances to make it work.

This will be hard. Probably 1-st we should update current code for
bison. We can start with expression rules defined for macrocompiler
eliminate all negative indexes in expressions and then extend them
adding support for statements.
Please remember that compiler and macrocompiler have a lot of common code
so modification in compiler should be synced with macrocompiler.

> To create expression tree and generate pcode after exit from parser, needs 
> a lot of changes, but I can not find a good way to do it in a small steps.
> Maybe leaving existing code to do the job and creation of parallel 
> expression tree could be a solution. We need to duplicate expression tree 
> code. Later we can move expression optimisation and pcode generation to new 
> expression tree, and later drop current pcode generation code.

The things should look easier if we clean grammar rules and use single
statement list. This can be the 1-st step.
We can add support for lemon later. It took some work but finally using
bison2.3 with destructors and our own expression garbage collector with
protection against some bison bugs we finally have working MT safe code
without memory leaks so it's not such critical as it was in the past
though I still think it's worth to do.

best regards,
Przemek
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to