Thanks for the advice. It is hard to say whether the useability folks want to distinguish between "/usr/include" as oppose to "usr include". Actually, I am sure that they would, but whether they would accept "usr include" is the right question to ask :-) I'll have to sort it out with them :-(
Thanks again. On 12/8/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > On Dec 8, 2005, at 10:15 AM, Beady Geraghty wrote: > > Since someone suggested hyphen, the next requestion > > is underscore. I can see more and more of these requests. > > Also, people might like to search for "/usr/include/wchar.h" (hence, > > the slash) and apostrophe etc. There really isn't a set of > > requirements > > upfront. In fact people wants EVERYTHING if > > they could, and full flexibility (even though they don't know > > whether they will need it or not.) > > So it appears that doing something "general" is better. > > Keep in mind that even if you tokenize "/usr/include" as "usr" and > "include" that a query for "/usr/include" will still match if it is > analyzed in a compatible manner. It is important to realize that > just because bits and pieces get eaten during analysis that it > doesn't make things unfindable. Sure, if someone literally wants to > search for "/" only then it is important to keep upfront, but > generally this is not the case. > > > I have been using StandardAnalyzer for the things you mentioned, like > > email address, and www.google.com or i.b.m. Those are good things > > for me to have. Since I've used it now, if I change it now, I > > might break > > other > > people's dependencies. > > WhitespaceTokenizer followed by a LowerCaseFilter also catches the > cases you mention. > > Again, lots of tests to assert what the users expect is a strong > recommendation I have with tokenization. > > > If you do have a list of pitfalls from javaCC, could you point me > > to it, > > that way, I can think about some of the potential issues and decide > > whether I should just abandon using javaCC ? > > It all really depends. JavaCC is complex, that is my only > reservation for recommending it. If you can get by with simpler > analysis techniques, then I say go for those instead. > > But, JavaCC is powerful. It simply isn't necessary for the bulk of > analysis cases I've come across. Works great for parsing query > expressions though. > > Erik > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >