subject:"RE\: Modifying StandardAnalyzer"

Re: Modifying StandardAnalyzer so that it also splits words after pun ctuation characters that are not followed by whitespace

2007-05-29 Thread Steven Rowe

Hi Michael, Michael Böckling wrote: > Hi folks! > > The topic says it all: I want to modify the StandardAnalyzer so that it also > splits words after punctuation characters (.,: etc.) that are NOT followed > by a whitespace character, in addition to punctuation characters that ARE > followed by w

Re: Modifying StandardAnalyzer so that it also splits words after pun ctuation characters that are not followed by whitespace

2007-05-29 Thread Erick Erickson

Well, one possibility is to do something simpler. Rather than modifying StandardAnalyzer, modify the input stream. That is, substitute spaces for punctuation NOT followed by whitespace and then just let the analyzer handle the result. For that matter, if you're going to alter the input stream bef

Re: Modifying StandardAnalyzer

2007-01-12 Thread Mark Miller

kenizing it as this: all one located 92226-4446 E-A-R -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Thursday, January 11, 2007 6:11 PM To: java-user@lucene.apache.org Subject: Re: Modifying StandardAnalyzer Would it be simpler just to modify the input with

Re: Modifying StandardAnalyzer

2007-01-12 Thread Mark Miller

It won't do what I need. I may have something like: "All-In-One is located in 92226-4446 and has an E-A-R" I want it to be tokenized as follows: all one located 92226 4446 E-A-R Right now... it is tokenizing it as this: all one located 92226-4446 E-A-R Thats the type of information you

RE: Modifying StandardAnalyzer

2007-01-12 Thread Van Nguyen

ssage- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Thursday, January 11, 2007 6:11 PM To: java-user@lucene.apache.org Subject: Re: Modifying StandardAnalyzer Would it be simpler just to modify the input with a regex rather than risk messing with StandardANalyzer? Or wouldn't that do

Re: Modifying StandardAnalyzer

2007-01-11 Thread Erick Erickson

Would it be simpler just to modify the input with a regex rather than risk messing with StandardANalyzer? Or wouldn't that do what you need? On 1/11/07, Van Nguyen <[EMAIL PROTECTED]> wrote: Hi, I need to modify the StandardAnalyzer so that it will tokenize zip codes that look like this:

Re: Modifying StandardAnalyzer

2007-01-11 Thread Mark Miller

I would try adding this (or your regex) | (("-" )|()) between the EMAIL and HOST line or something, And change this: org.apache.lucene.analysis.Token next() throws IOException : { Token token = null; } { ( token = | token = | token = | token = | token = | token = |

Re: Modifying StandardAnalyzer so that it also splits words after pun ctuation characters that are not followed by whitespace

Re: Modifying StandardAnalyzer so that it also splits words after pun ctuation characters that are not followed by whitespace

Re: Modifying StandardAnalyzer

Re: Modifying StandardAnalyzer

RE: Modifying StandardAnalyzer

Re: Modifying StandardAnalyzer

Re: Modifying StandardAnalyzer

7 matches

Site Navigation

Mail list logo

Footer information