Well, one possibility is to do something simpler. Rather than modifying StandardAnalyzer, modify the input stream. That is, substitute spaces for punctuation NOT followed by whitespace and then just let the analyzer handle the result.
For that matter, if you're going to alter the input stream before giving it to the analyzer, you can then use pretty simple analyzers unless you need some of the other characteristics of StandardAnalyzer.... Or not.... Erick On 5/29/07, Michael Böckling <[EMAIL PROTECTED]> wrote:
Hi folks! The topic says it all: I want to modify the StandardAnalyzer so that it also splits words after punctuation characters (.,: etc.) that are NOT followed by a whitespace character, in addition to punctuation characters that ARE followed by whitespace. Of course i've looked at StandardTokenizer.jj, but I don't quite get it. The recursive nature of the grammar bends my mind. Can someone smarter than me help here? I'd be most thankful! Regards, Michael -- Michael Böckling Java Engineer dmc digital media center GmbH Rommelstraße 11 70376 Stuttgart (Germany) Telefon: +49 711 601747-0 Telefax: +49 711 601747-141 E-Mail: [EMAIL PROTECTED] Internet: www.dmc.de Handelsregister: AG Stuttgart HRB 18974 Geschäftsführer: Andreas Magg, Daniel Rebhorn, Andreas Schwend --------------------------------------------- Besseres E-Business. dmc ist die kreative Vernetzung von Agentur, Systemhaus und Service. Seit über 10 Jahren entwickeln und realisieren wir zukunftweisende und erfolgreiche E-Business-Lösungen. Zu unseren langjährigen Kunden zählen neckermann.de, Kodak und Telekom Training. dmc auf Platz 8 im aktuellen New Media Service Ranking. Als inhabergeführte und netzwerkunabhängige Agentur gehören wir mit einem Umsatz von 13,50 Mio. Euro zu den Top 10 der erfolgreichsten New Media Dienstleister in Deutschland. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]