On Dec 7, 2005, at 9:08 PM, Beady Geraghty wrote:
In general, do the rules in javaCC work pretty well.

In general, all answers would be too general to be useful :)

JavaCC is great - I'm using it for a custom query parser myself. But it's not for the feint of heart. It may be more than you need, it all depends. The main thing StandardTokenizer does is keep e-mail addresses intact, and a few other fiddly things.

If you provide us with some sample text and how you want that tokenized, I'm sure we could offer suggestions.

  Since
there may be more requests  to be included punctuations
in the search terms, so I have to keep modifying this .jj file.
I wonder if there are things that I should watch out for before
getting overly complicated and get stuck somewhere down the
road ?

There are many pitfalls with JavaCC grammars. It takes practice and unit tests to get this stuff right. The same could be said of any style of tokenization. Make lots of tests to ensure you don't break expected behavior as you tweak.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to