Hello,
I figured out how to solve this. I just added stopTypes.add("");
On Wed, Jun 12, 2013 at 8:39 PM, Gucko Gucko wrote:
> Hello all,
>
> is there a filter I can use to remove emails from a TokenStream?
>
> so far I'm using this to remove numbers, URls, and I
Hello all,
is there a filter I can use to remove emails from a TokenStream?
so far I'm using this to remove numbers, URls, and I would like to remove
emails too:
Tokenizer tokenizer = new UAX29URLEmailTokenizer(Version.LUCENE_43,
new StringReader(text));
Set stopTypes = new HashSet();
st
ath carefully and make sure
> all JAR files of Lucene have the same version and no duplicate JARs with
> different versions are in it!
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
Hello all,
I'm trying the following code (trying to play with Tokenizers in order to
create my own Analyzer) but I'm getting an exception:
public class TokenizerTest {
public static void main(String[] args) throws IOException {
String text = "A #revolution http://hi.com in t...@test.com softwa
Hello all,
I'm trying to cluster documents that were indexed using Lucene 4.3. The
results of the clustering algorithm is a set of clusters where each cluster
contains the most similar documents (I only store their docIDs in each
cluster). What I want is to get the most frequent words for each clu