On Dec 7, 2005, at 9:08 PM, Beady Geraghty wrote:
In general, do the rules in javaCC work pretty well.
In general, all answers would be too general to be useful :)
JavaCC is great - I'm using it for a custom query parser myself. But
it's not for the feint of heart. It may be more than you need, it
all depends. The main thing StandardTokenizer does is keep e-mail
addresses intact, and a few other fiddly things.
If you provide us with some sample text and how you want that
tokenized, I'm sure we could offer suggestions.
Since
there may be more requests to be included punctuations
in the search terms, so I have to keep modifying this .jj file.
I wonder if there are things that I should watch out for before
getting overly complicated and get stuck somewhere down the
road ?
There are many pitfalls with JavaCC grammars. It takes practice and
unit tests to get this stuff right. The same could be said of any
style of tokenization. Make lots of tests to ensure you don't break
expected behavior as you tweak.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]