As what I read in antlr3string.c:
static pANTLR3_STRING
subString8 (pANTLR3_STRING string, ANTLR3_UINT32 startIndex, ANTLR3_UINT32
endIndex)
{
pANTLR3_STRING newStr;
if (endIndex > string->len)
{
endIndex = string->len + 1;
}
newStr = string->factory->newPtr(string->factory
It seems the charPosition() function from token strut returns the char
position in line, not the offset from the start of the input stream.So, I
wondering is there anyway to get the char index of one token from the start
of the input stream in parser?
Thanks,
Best Regards,
chainone
--~--~
Is any API available for Parser to do this?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to il-antlr-interest@googlegroups.com
To unsubscribe from this g
void NUEDataManager::ParseOneEntry(const char* entry_start)
{
p_semi = strchr(entry_start,';');
assert(p_semi);
m_input =
antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8)entry_start,p_semi-entry_start+1,NULL);
m_lex= StepDataEntryLexerNew(m_input);
m_tokens = antlr3Commo
a). Add this line after calling antlr3LexerNewStream in the
function testlexLexerNewSSD:
"ctx->pLexer->rec->state->tokSource->nextToken = myNextToken;"
4. Modify one terminal lex rule to the form like:
SEMI : ';' {if_lexer_need_stop = TRUE;}
;
On Fri,
While compiling the SEMI rule (SEMI : ';' .*;), the follow error message
pops up:
" The following alternatives can never be matched: 1"
Seems it is necessary to put an end to the SEMI rule.
On Fri, Jan 16, 2009 at 1:17 AM, Jim Idle wrote:
> chain o
wrote:
> chain one wrote:
>>
>> I want to only parse the beginning of a file. And I don't want the
>> recognizer to beak the whole file into huge number of tokens which will
>> make the memory consuming very high.
>> So after the lexer having recognized one spe
the same problem with me? If it
is yes, could you tell me how to solve it? I would very appreciate your
help.
On Thu, Jan 15, 2009 at 4:45 PM, Gavin Lambert wrote:
> At 21:35 15/01/2009, chain one wrote:
>
>> I want to only parse the beginning of a file. And I don't want the
>
I want to only parse the beginning of a file. And I don't want the
recognizer to beak the whole file into huge number of tokens which will make
the memory consuming very high.So after the lexer having recognized one
specified TOKEN such as ";", I want to tell the lexer to stop, and pass the
tokens
/01/2009, chain one wrote:
>
>> I am still working on this. Searching for solutions that could make the
>> IDENT rule and FUNCTION_DECL work together.
>>
>
> Well, one thing you could try would be:
>
> IDENT
> : ('FUNCTION') => FUNCTION_DECL { $type =
I found using the same lex rule,the size of the generated .Java file is much
smaller than the generated .C file.For example, the size of
the generated .Java file is 124K, while the size of .c file could reach to
14M!!!
--~--~-~--~~~---~--~~
You received this messag
I am still working on this. Searching for solutions that could make the
IDENT rule and FUNCTION_DECL work together.Could anybody help me with these?
Thanks.
On Wed, Jan 14, 2009 at 10:54 AM, chain one wrote:
> I know where the problem isThere is another rule named IDENT:
>
> IDENT
ENTITY_DECL
: 'BEGIN_ENTITY' ( options {greedy=false;} : . )* 'END_ENTITY' SEMI
;
PROCEDURE_DECL
: 'PROCEDURE' ( options {greedy=false;} : . )* 'END_PROCEDURE' SEMI
;
TYPE_DECL
: 'TYPE' ( options {greedy=false;} : . )* 'END_TYPE' SEMI
;
SUBTYPE_CONSTRAINT_DECL
: 'SUBTYPE_CONSTRAINT' ( options {gre
I know where the problem isThere is another rule named IDENT:
IDENT
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
seems this rule conflicts with the FUNCTION_D
Seems it still doesn't work.I am still working on this.
I learned a lot from your rely. Thanks Gavin.
On Wed, Jan 14, 2009 at 4:21 AM, Gavin Lambert wrote:
> At 00:29 14/01/2009, chain one wrote:
>
>> I tried the lexer rule you gave me. But following error comes out:
>
#x27;O''N'{'0'..'9',
'A'..'Z', '_',
'a'..'z'}'F''U''N''C''T''I''O''N''E''N''D''_''F
This also could not work : ( :
fragment
FUNCTION:
'FUNCTION'
;
fragment
END_FUNCTION
:'END_FUNCTION'
;
FUNCTION_DECL
:FUNCTION
{
SKIP();
}
( ~(FUNCTION|END_FUNCTION)
|
FUNCTION_DECL
)* END_FUNCTION SEMI
;
--~--~-~--~--
I want to recognize a function definition and skip it before passing tokens
to the parser.The function definition starts with "FUNCTION" ,ends with
"END_FUNCTION".
Also it could be nested,for example:
FUNCTION value_range_aggregate_rep_item(agg
: AGGREGATE OF representation_item) : BOOLEAN;
F
enumeration_type
: DOT IDENT DOT
;
DOT
: '.'
;
fragment
DIGIT
: '0'..'9'
;
INT
: '-'? DIGIT+
;
FLOAT
: '.' DIGIT* EXPONENT?
;
fragment
EXPONENT: ('e' | 'E') ('+' | '-')? (DIGIT)+;
IDENT
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
when the input is ".EXTERNAL." ,
x->pLexer->setCharStream(m_lex->pLexer,m_input);
However, it will make programme crash at line 482 of antlrtokenstream.c
I don't know how to make it work. I would appreciate if you take a look at
this and give some suggestions. : )
Best Regards,
Young
On Mon, Jan 5, 2009 at 1:16 PM, Jim
There are many pieces of inputs , all of which should be parsed by one
parser.Such as :
Input 1:
Jack 100$
Input 2:
Tom 200$
Input ..
However, this kind of inputs doesn't come all in one time. They arrive at
different time. Once one input arrives, it needs to be parsed immediately.
1.INT : '-'? DIGIT+
;
FLOAT
: ('-'|'+')? DIGIT+ '.' DIGIT* (('e' | 'E') ('+' | '-')? (DIGIT)+)?
;
2.
INT_OR_FLOAT
: '-'? DIGIT+ FLOAT?
;
fragment
FLOAT
: '.' DIGIT* EXPONENT?
;
fragment
EXPONENT: ('e' | 'E') ('+' | '-')? (DIGIT)+;
Which one is quicker?
I think the 2th is
, 16 Dec 2008 18:28:50 -0800, chain one wrote:
>
> Thanks Jim,
>
>
> Your suggestion is very helpful to me.
> I have checked my grammar according to your advice and SKIP some useless
> tokens.
> After doing this, the peak memory usage decrease  from 660M to 480M.
> However 4
Idle
> On Tue, 16 Dec 2008 15:54:58 -0800, chain one wrote:
>
> > Still waiting for help
> > I just wanna know, if c runtime target is suitable for large input?
>
> Yes.
>
>
> >
> > On 12/16/08, chain one wrote:
> >> Hi,
> >> These days
Still waiting for help
I just wanna know, if c runtime target is suitable for large input?
On 12/16/08, chain one wrote:
> Hi,
> These days I am writing a parser for a kind of data file using C++. The
> format of the data file is simple, so the rules are simple.
> But when I feed
Hi,
These days I am writing a parser for a kind of data file using C++. The
format of the data file is simple, so the rules are simple.
But when I feed a about 20M-size data file to the parser, the parser eats
almost 600M+ memory.
I am surprised by this result and I found most memory and time were
Hello, When the target language is Java, we use "skip()" to throw out
what it just matched.
But what about the situation when the target language is C?
I failed to find such kind of functions in ANTLR3 C Runtime API Document.
Could anyone who is familiar with ANTLR3 C Runtime tell me which f
Is that possible?Due to the memory consuming issue, in the first pass, I
only want to grab some data to be used in the second pass and at the second
pass, the AST is built up using the same grammar file.
So I need to turn on the option(output=AST) at the begin of the second
pass.
Is that possib
I just found that ANTLR V3 would firstly turn input file stream into tokens,
then the parser started to work.Is there anyway to set the token buffer
length?
Check the following Lexer rule:
IDENT
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
{
if (!parser.isFirst) $type=globalSearchI
I found in ANTLR V3, before Parser rules began to be appiled, all the tokens
have been recognized(That is also to say that all the Lexer Rules have been
applied.).
Such as this rule:
IDENT
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
{
$type=globalSearchId($text);
}
;
globa
I noticed the first time parse is OK.But when turning isFirst to false, this
kind of error message keep coming out.
I don't know why the failure of a predicate would lead to an error.
2008/12/3 chain one <[EMAIL PROTECTED]>
> And there are a lot of error tag in the final printed
And there are a lot of error tag in the final printed tree like this:
(TYPE_DECL
Is it caused by the failure of the predicate?
2008/12/3 chain one <[EMAIL PROTECTED]>
> procedure_id
> : { isFirst }? id=IDENT { addId($id.getText(),PROCEDURE_IDENT); }
>
procedure_id
: { isFirst }? id=IDENT { addId($id.getText(),PROCEDURE_IDENT); }
| nid=PROCEDURE_IDENT { $nid.setType(IDENT);} ->
^(PROCEDURE_ID[] PROCEDURE_IDENT)
;
In the first pass, isFirst is true, and in the second pass, isFirst is set
to be false.
When running t
Sorry for spaming
I noticed that only to remove the '$' in front of getText and setType()
would work.
What a stupid question! : )
2008/12/3 chain one <[EMAIL PROTECTED]>
> The V2 Lexer rule is:
> IDENT
>
> : ('a'..'z'|'A'..'Z'
The V2 Lexer rule is:
IDENT
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
{
if (!parser.isFirst) $setType(globalSearchId($getText()));
}
;
I searched in the ANTLR V3 book but failed to find the way to get the Lexer
rule's Text and Type directly from its alternatives.
Anyone who k
ather than in a
> parser rule. Then, it will have a name.
> -Jared
>
> chain one wrote:
>
>> Hello:
>> I have a question about the token value, the question is :
>> In the generated Java file, the token values are defined in this way:
>>
>> public static
Hello:
I have a question about the token value, the question is :
In the generated Java file, the token values are defined in this way:
public static final String[] tokenNames = new String[] {
"", "", "", "","ENTITY_VAR_IDENT",
"RULE_ID","INTERVAL_ITEM","'ONEOF'"
};
public static
options
{
output=AST;
}
entity_id returns [String eid]
@init{ eid=null; }
: { isFirst }? id=IDENT { eid=id.getText();
/*addId(eid,ENTITY_IDENT);*/ } -> ^(ENTITY_ID[] $id)
| id2=ENTITY_IDENT { eid=id2.getText();$id2.setType(IDENT);} ->
^(ENTITY_IDENT[] $id2)
;
vari
I am reading the <> book
And I am confused about something talked in p315 of Chapter 12.
Could someone tell me what the difference is between the following two
rules?
stat
options{backtrack=true}
: declaration
| expression
;
stat
:(declaration)=> declaration
| expression
;
Thanks for your help
The rule is simple_expression: term ( add_like_op term )*
If it doesn't need to place an imaginary as its root, I can simply write
this rule like that: simple_expression: term ( add_like_op^ term )*
How about if I wanna put an imaginary node on top of it ?
simple_expression: term ( a
Sorry for the wide distribution, I am newbie to ANTLR.
I found a V2 AST rule.
logical_literal
: 'false' { $logical_literal = $([LOGICAL_LITERAL,
'LOGICAL_LITERAL'], $logical_literal); };
Is it correct to translate this rule to the following RULE in V3?
logical_literal
: 'false' -
I mean the Abstract Syntax Tree interpreters
Has anybody ever used ANTLR to write this kind of interpreter?
If yes, could you drop me some material related to this? Thanks.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Gr
I am right now using antlr V2 to write a parser.And I meet this
problem,don't know how to fix it.
The problem is:
Lexer rule:
INT
:(DIGIT)+
;
FLOAT
:'.' (DIGIT)+ (('e' | 'E') ('+' | '-')? (DIGIT)+)?
|'.' ('e' | 'E') ('+' | '-')? (DIGIT)+
;
DIGIT
: '0'..'9'
;
I want to match t
43 matches
Mail list logo