On Mon, 05 Jan 2009, Mindaugas Kavaliauskas wrote: Hi Mindaugas,
> After last changes comparison table is (no -ko flags used): > No Test code Harbour PHP Python Ruby Java Java+JIT > ----------------------------------+------+------+------+------+-------+ > 10'000'000 times: > 11 nJ+=nI*2+17 4.61 3.41 6.54 25.53 1.36 0.08 > 12 nJ+=17+1+...+1+nI*2 4.61 6.10 6.62 45.86 1.36 0.08 > 13 nJ+=nI*2+17+1+...+1 10.86 5.95 12.96 45.78 1.92 0.17 > 200'000 times: > 21 cI:=cI+"a" 0.08 6.70 0.11 25.80 1641.766 1650.437 > 22 cI+="a" 0.08 0.07 0.12 25.83 1649.344 1654.735 > 31 AADD(aI,{nI}) 0.16 0.29 0.32 0.20 0.76 0.40 > 32 AADD(aI,{aI[nI-1][1]+1}) 0.25 0.34 0.38 0.39 0.92 0.40 > ----------------------------------------------------------------------- > So, we need to work on typed integer variable optimisation to be perfect :) The -ko compiler switch will decrease the cost of 13 test above to 12 level. Why you didn't use it? > Since, I do not know the compiler internals very well, I've started from > bison rules. Rewriting it to lemon parser was a good way to know internals > better. It took me more than a day of rewriting rules. But all this job > requires some changes in C code also. So, I'm not sure if these lemon rules > are useful at all, but the goal was to understand internals better. The > main differences (bison vs. lemon), I found: I'm really under express of your job. > 1) non-terminal symbol should be lowercase > 2) terminal symbol can not be character. All ',', '(', ')' should be > changed to some define, like C_COMMA, C_LBRACKET, C_RBRACKET, etc. And > complex code has to be fixed to return these constants. Mayby we can try to use common symbol values for PP and lexer to reduce number of translations anyhow even adding some new one to existing code is not a problem. > 3) symbol : symbola symbolb symbolc > has to be converted to > symbol ::= symbola symbolb symbolc > 4) $$, $1, $2, ... > has to be converted to explicitly defined X, A, B, ... > symbol(X) ::= symbola(A) symbolb(B) what greatly increase readability and eliminates possible typos when grammar rules have to be changed and indexes are not correctly updated what is quite often problem with bison. > 5) lemon passes the whole structure, not a union member as bison do. So: > "%type <string> IdentName" and "$1", should be changed to: > "%type identname {HB_EXPR_PTR}" and "A->string". It's rather result of used syntax not low level constructions used on grammar stack and here lemon rules are also more readable. > 6) lemon does not allow multiple C code for single rule. You can specify a > single C code an the end of rule. So, you can not write rules: > DefTwoParam : ParamOne AsType > { hb_compVariableAdd($1, HB_COMP_PARAM->cVarType); } > ',' ParamTwo AsType > { hb_compVariableAdd($5, HB_COMP_PARAM->cVarType); } > This should be fixed by adding value to AsType non-terminal symbol instead > of "global" compiler variable cVarType. > AsType : AS_NUMERIC { $$ = 'N'; } > ... > DefTwoParam : ParamOne AsType ',' ParamTwo AsType > { hb_compVariableAdd($1, $2); > hb_compVariableAdd($4, $5); } In theory is limitation but in practice it forces much more cleaner rules because it's not possible to easy use such global hack like in bison. It makes later code updating much easier. Specially when many people have to work common project. Any global variables used in bison grammar rules sooner or later creates some problems. For me it's very good design decision. It also helps to create MT and reentrant safe code. > The problem is that AsType keeps its state in two global variables: > cVarType and szFromClass, so we can not store type in a simple integer > value and need to define a structure, collect these structures in some > list, etc. There are a few more global variables, like: iVarScope, etc. Yes it is and even if we will want to still with bison then it will be good to eliminate them. Though it does not mean that we will be able to fully eliminate all global variables. In some places they can be very usable and in some other they will be strictly necessary f.e. stack index updated by statements which allocates stack items for temporary variables and much easier and faster WITH OBJECT implementation (technically WITH OBJECT is only unnamed temporary variable). There is also problem with syntax error reporting. In Clipper END, ENDIF, ENDDO, NEXT, ENDCASE always closes the last statement even if it's not valid clause and only error is reported, f.e.: FOR i:=1 to 2 IF x ? "x" NEXT // here error is reported and NEXT works like ENDIF // you can confim it if you put ENDIF in THIS line NEXT Compiling this code generate in Clipper only one error. To reach such effect we can add all closing statements to each defined statement but the ones which are not valid will generate RT errors. We can also use one global list common for all type of statemants and do not define full grammar rules for them but only separated for simple IF ..., FOR ..., WHILE ..., ... which will push new statement and END, ENDIF, ENDDO, NEXT, ... which will pop last statement generating error if necessary. ELSE, ELSEIF, CASE, ... will operate on last statement in the list. Such list greatly simplify GOTO instructions when some statements allocate VM stack. In our case LOOP and EXIT. It's very easy to check if they can be used in some nested statements and generate RT error or pop valid number of stack items allocated by statements which are interrupted/exited. This version is usually much easier to update when you want to extend language and introduce some new constructions. Please also note that it allows to eliminate a lot of { intermediate C code }. > Sometimes a simple solution exists, definition of new non-terminal symbols, > i.e. > DefTwoParam : ParamOne AsType DefTwoParam1 ',' ParamTwo AsType > { /* end C code */ } > DefTwoParam1 : /*empty */ { /* middle C code */ } > but the problem is that ParamOne is not accessible in "middle C code". > 7) Bison allows access symbols not from current rule. Actually ParamOne > could be reached by $-1 from "middle C code", but these hacky solutions are > not allowed in lemon. There are 4 places containing hacky code in current > bison rules. And should be eliminated. All negative indexes in grammar rules are potentially source of problems. > Now I would like more a way to implement parser rules rewriter instead of > rewriting rules by hand. Only some places (like $0) will need to be fixed > by hand. I think that also some others, i.e. see above. Even if it's not strictly necessary then it will make code cleaner, simpler and easier to update or extend. > ==================================================== > The problem with all these rule changes are: > 1) lemon finds 39 collisions, but I'm unable to find a reason. (I guess > there 3 or 4 collisions, but symbols are used a few times, so it generate a > lot collisions). > 2) I'm unable to do compiler changes in a small steps. So, I've ruined > compiler code without many chances to make it work. This will be hard. Probably 1-st we should update current code for bison. We can start with expression rules defined for macrocompiler eliminate all negative indexes in expressions and then extend them adding support for statements. Please remember that compiler and macrocompiler have a lot of common code so modification in compiler should be synced with macrocompiler. > To create expression tree and generate pcode after exit from parser, needs > a lot of changes, but I can not find a good way to do it in a small steps. > Maybe leaving existing code to do the job and creation of parallel > expression tree could be a solution. We need to duplicate expression tree > code. Later we can move expression optimisation and pcode generation to new > expression tree, and later drop current pcode generation code. The things should look easier if we clean grammar rules and use single statement list. This can be the 1-st step. We can add support for lemon later. It took some work but finally using bison2.3 with destructors and our own expression garbage collector with protection against some bison bugs we finally have working MT safe code without memory leaks so it's not such critical as it was in the past though I still think it's worth to do. best regards, Przemek _______________________________________________ Harbour mailing list Harbour@harbour-project.org http://lists.harbour-project.org/mailman/listinfo/harbour