Re: Splitting of words

2005-09-27 Thread Erik Hatcher
On Sep 27, 2005, at 6:29 AM, Endre Stølsvik wrote: On Thu, 22 Sep 2005, Erik Hatcher wrote: | | On Sep 22, 2005, at 4:36 AM, Endre Stølsvik wrote: | | > | > | The StandardTokenizer is the most sophisticated one built into Lucene. | > You | > | can see the types of tokens it emits by looking

Re: Splitting of words

2005-09-27 Thread Endre Stølsvik
On Thu, 22 Sep 2005, Erik Hatcher wrote: | | On Sep 22, 2005, at 4:36 AM, Endre Stølsvik wrote: | | > | > | The StandardTokenizer is the most sophisticated one built into Lucene. | > You | > | can see the types of tokens it emits by looking at the javadoc here: | > | | >

Re: Splitting of words

2005-09-22 Thread Erik Hatcher
On Sep 22, 2005, at 4:36 AM, Endre Stølsvik wrote: | The StandardTokenizer is the most sophisticated one built into Lucene. You | can see the types of tokens it emits by looking at the javadoc here: |

Re: Splitting of words

2005-09-22 Thread Endre Stølsvik
| The StandardTokenizer is the most sophisticated one built into Lucene. You | can see the types of tokens it emits by looking at the javadoc here: | | | It recognizes e-mail addresses, interi

Re: Splitting of words

2005-09-13 Thread Erik Hatcher
On Sep 13, 2005, at 7:24 AM, Madhu Satyanarayana Panitini wrote: Hi Paul, I agree with u "Analyzer is the magic word" Lets look it in depth and clear, I would consider three parts in the analyzer 1. Tokenization (splitting of words) 2. Stopwords removal (depends up on the l

RE: Splitting of words

2005-09-13 Thread Madhu Satyanarayana Panitini
Hi Paul, I agree with u "Analyzer is the magic word" Lets look it in depth and clear, I would consider three parts in the analyzer 1. Tokenization (splitting of words) 2. Stopwords removal (depends up on the language) 3. stemming of the words (depends up on the language) Firs