[il-antlr-interest: 24090] Re: [antlr-interest] Customizing token separators without recompiling

J. Stephen Riley Silber Sun, 07 Jun 2009 15:03:02 -0700

Oh, I'm saying you wouldn't want to use a grammar at all.  The problem you've 
described is lexical, not grammatical.  If you simply want to break apart a 
line of text based on an arbitrary delimiter, it would be much easier to write 
a program in Perl, Python, Java, etc. that split the text based on a 
configuration setting.

If further parsing needs to happen on the newly-split fields, then you can 
attack that problem piecemeal on an individual basis.

Make sense?

--- On Sun, 6/7/09, Dukie Banderjee <dukie_bander...@hotmail.com> wrote:

From: Dukie Banderjee <dukie_bander...@hotmail.com>
Subject: RE: [antlr-interest] Customizing token separators without recompiling
To: jsrs...@yahoo.com, antlr-inter...@antlr.org
Date: Sunday, June 7, 2009, 2:30 PM

#yiv1630828363 .hmmessage P
{
margin:0px;padding:0px;}
#yiv1630828363 {
font-size:10pt;font-family:Verdana;}

Hi,

Sorry, I'm not following you. How would that work? E.g. A new customer comes 
along, they have their format that uses '_' (or whatever), and how do I get the 
lexer/parser to recognize their file format without re-generating/re-compiling 
the lexer/parser? What would Perl operate on? The grammar? Wouldn't that 
require re-generating/re-compiling the lexer?

Rob

Date: Sun, 7 Jun 2009 12:48:50 -0700
From: jsrs...@yahoo.com
Subject: Re: [antlr-interest] Customizing token separators without recompiling
To: antlr-inter...@antlr.org; dukie_bander...@hotmail.com

Howdy,

I'm guessing there's more to the problem than just supporting arbitrary field 
separation tokens, because if that's all there is, just use something like perl 
and store the separator(s) in a config file...?

--S

--- On Sun, 6/7/09, Dukie Banderjee <dukie_bander...@hotmail.com> wrote:

From: Dukie Banderjee <dukie_bander...@hotmail.com>
Subject: [antlr-interest] Customizing token separators without recompiling
To: antlr-inter...@antlr.org
Date: Sunday, June 7, 2009, 8:25 AM

#yiv1630828363 .ExternalClass #EC_yiv791166732 .EC_hmmessage P
{padding:0px;}
#yiv1630828363 .ExternalClass #EC_yiv791166732
{font-size:10pt;font-family:Verdana;}

Hi everyone,

I'm new to the list and new to ANTLR. I have a specific problem I need to solve 
and I hope ANTLR can help.

Our client has several end-customers who all have slightly different document 
formats used for data interchange.

All the documents are basically 'standard' EDI documents, meaning they have the 
same basic syntax. However, some customers will use a '+' to separate values, 
some will use '*', others will use '~', etc. (I'm reminded of the old saying, 
"The great thing about standards is that there are so many to choose from!")

So, basically, the following inputs are all basically the same, except for the 
character used to separate tokens:
FST*4290*D*W*20070607
FST+4290+D+W+20070607
FST~4290~D~W~20070607

The thing is, we don't know ahead of time which separator characters might be 
used in the future, and we need to be able to tweak each end-customer's file 
format without re-compiling the
 lexer/parser. For example, a year from now there might be a customer who 
decides to use '_' or '$' or whatever, and we need to provide our client with a 
simple way (e.g. a per-customer configuration file) to customize the 
lexer/parser for such situations, without re-generating/re-compiling.

So, is this possible with ANTLR? How would I do this? Would it require a custom 
Lexer subclass with constructor parameters (e.g. new CustomLexer('_')) or 
something? How would this mesh with the generated lexer code from ANTLR?

I'm quite new to tools such as ANTLR (and parsers in general), so any help 
would be much appreciated. I really don't know where to start with this 
problem. For a hand-coded parser it's fairly simple, but I don't know enough 
about the workings of ANTLR to see where I would need to tweak it.

Thanks,

Rob

Create a cool, new character for your Windows Live™ Messenger.  Check it out 

-----Inline Attachment Follows-----

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

Windows Live helps you keep up with all your friends,  in one place. 

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-interest@googlegroups.com
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

[il-antlr-interest: 24090] Re: [antlr-interest] Customizing token separators without recompiling

Reply via email to