We have a list of keywords with aliases (Example: keyword = "ms access"
aliases = "microsoft access", "msaccess", "m.s. access" )
We would like to intercept the aliases prior to them being indexed, and have
the keyword indexed instead. We can do this with a CustomFilter for single
word aliases
I was looking for an option for Text extraction from a word doc.
Currently I am using POI; however, when there is a table in the doc, for
each column POI brings back a . The whitespace analyzer is not filtering
out this character. So whatever word or phrase that is the last word or
phrase wi
That is awesome, just one thing, and forgive me if i sound ignorant. What is
"FastZemberek zemberek"?
Ahmet Arslan wrote:
>
>
>> public class CustomFilter extends TokenFilter
>> {
>> protected CustomFilter(TokenStream
>> tokenStream)
>> {
>> super(tokenStream);
>> }
>>
In the current version of lucene, 3.0 the following methods are no longer
available.
- TokenStream.next()
- TokenStream.next(Token).
- Token.setTermText()
- Token.termText().
The newer versions says to use, incrementToken() and AttributeSource APIs.
But I cannot find much hel
(nextToken != null)
{
nextToken.setTermText(nextToken.termText().replaceAll(":|,|\\(|\\)|“|~|;|&|\\.",""));
}
return nextToken;
}
}
maxSchlein wrote:
>
> Can someone please point me in the right direction.
>
> We are creating
Can someone please point me in the right direction.
We are creating an application that needs to beable to search on C++ and get
back doc's that have C++ in it. The StandardAnalyzer does not seem to index
the "+", so a search for "C++" will bring back docs that contain, C++, C,
C#, etc. The