Re: Dynamic tokens

2020-02-03 Thread Hans Åberg


> On 2 Feb 2020, at 20:29, Ervin Hegedüs  wrote:
> 
> is there any way to make a parser with "dynamic" tokens?
> 
> I mean in compiling time I don't know the available tokens.

It is not possible to have dynamically created token values, …

> Now I describe
> the necessary token with regex, but I bumped into a problem.

… but it might be possible to use augmented methods.

> The language syntax is some like this:
> 
> EXPRESSION: LANG_OP LANG_OP_ARGUMENT | LANG_OP_ARGUMENT
> 
> where (as you can see) the LANG_OP is optional. If there isn't LANG_OP,
> that means that is the most usable operator (namely "@rx" in my case). The
> syntax of the operator (with regex): "@[a-z][a-zA-Z0-9]+".
> 
> Example from the language:
> @eq 1
> @lt 2
> @streq foo
> 
> The problem is that the LANG_OP_ARGUMENT could be anything - for example,
> that could be also starts with "@". So, the next expression is valid:
> 
> @streq @streq

So here you might have a context switch that is set when the operator token 
comes, that says that the next token, even if it is a valid operator name, 
should be treated as an argument. It when the argument is finished, set the 
switch back.

> Now I'm using this rules:
> @[a-z][a-zA-Z0-9]+ { BEGIN(ST_LANG_OP); return LANG_OP; }
> 
> 
> but now the operator isn't optional.

Something must follow in the grammar, so the switch may be set back in the 
grammar. Check in the .output file for clues.

> If I write in the language:
> 
> "@rx" that means that's an operator argument, without operator.

One can have a symbol table that stores all operator names. If it is not there, 
return it as ain identifier. This way, one can dynamically define new operator.

If further, the table stores the token value, it can be used for other object, 
like variables that may have different syntax depending on type.





Re: Dynamic tokens

2020-02-03 Thread Ervin Hegedüs
Hi Hans,


thanks for your reply,

On Mon, Feb 03, 2020 at 03:00:20PM +0100, Hans Åberg wrote:
> 
> > On 2 Feb 2020, at 20:29, Ervin Hegedüs  wrote:
> > 
> > is there any way to make a parser with "dynamic" tokens?
> > I mean in compiling time I don't know the available tokens.
> 
> It is not possible to have dynamically created token values, …

ah,
 
> > Now I describe
> > the necessary token with regex, but I bumped into a problem.
> 
> … but it might be possible to use augmented methods.

ehm, you're right. I totally forgot that :).
 
> > Example from the language:
> > @eq 1
> > @lt 2
> > @streq foo
> > 
> > The problem is that the LANG_OP_ARGUMENT could be anything - for example,
> > that could be also starts with "@". So, the next expression is valid:
> > 
> > @streq @streq
> 
> So here you might have a context switch that is set when the operator token 
> comes, that says that the next token, even if it is a valid operator name, 
> should be treated as an argument. It when the argument is finished, set the 
> switch back.
> 
> > Now I'm using this rules:
> > @[a-z][a-zA-Z0-9]+ { BEGIN(ST_LANG_OP); return LANG_OP; }
> > 
> > 
> > but now the operator isn't optional.
> 
> Something must follow in the grammar, so the switch may be set back in the 
> grammar. Check in the .output file for clues.

so, you think (if I understand correctly) something like this:

@[a-z][a-zA-Z0-9]+  { BEGIN(ST_LANG_OP); if(op_valid(yytext); { return 
LANG_OP; } else { ... } }


> > If I write in the language:
> > "@rx" that means that's an operator argument, without operator.
> 
> One can have a symbol table that stores all operator names. If it is not 
> there, return it as ain identifier. This way, one can dynamically define new 
> operator.
> 
> If further, the table stores the token value, it can be used for other 
> object, like variables that may have different syntax depending on type.

I think it's clear - many-many thanks for your help! :)


a.




Re: Dynamic tokens

2020-02-03 Thread Hans Åberg


> On 3 Feb 2020, at 16:33, Ervin Hegedüs  wrote:
…
>>> Example from the language:
>>> @eq 1
>>> @lt 2
>>> @streq foo
>>> 
>>> The problem is that the LANG_OP_ARGUMENT could be anything - for example,
>>> that could be also starts with "@". So, the next expression is valid:
>>> 
>>> @streq @streq
>> 
>> So here you might have a context switch that is set when the operator token 
>> comes, that says that the next token, even if it is a valid operator name, 
>> should be treated as an argument. It when the argument is finished, set the 
>> switch back.
>> 
>>> Now I'm using this rules:
>>> @[a-z][a-zA-Z0-9]+ { BEGIN(ST_LANG_OP); return LANG_OP; }
>>> 
>>> 
>>> but now the operator isn't optional.
>> 
>> Something must follow in the grammar, so the switch may be set back in the 
>> grammar. Check in the .output file for clues.
> 
> so, you think (if I understand correctly) something like this:
> 
> @[a-z][a-zA-Z0-9]+  { BEGIN(ST_LANG_OP); if(op_valid(yytext); { return 
> LANG_OP; } else { ... } }
> ….

Something like that.