Hello all,
I'm trying to cluster documents that were indexed using Lucene 4.3. The
results of the clustering algorithm is a set of clusters where each cluster
contains the most similar documents (I only store their docIDs in each
cluster). What I want is to get the most frequent words for each clu
Hello all,
I'm trying the following code (trying to play with Tokenizers in order to
create my own Analyzer) but I'm getting an exception:
public class TokenizerTest {
public static void main(String[] args) throws IOException {
String text = "A #revolution http://hi.com in t...@test.com softwa
ath carefully and make sure
> all JAR files of Lucene have the same version and no duplicate JARs with
> different versions are in it!
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
Hello all,
is there a filter I can use to remove emails from a TokenStream?
so far I'm using this to remove numbers, URls, and I would like to remove
emails too:
Tokenizer tokenizer = new UAX29URLEmailTokenizer(Version.LUCENE_43,
new StringReader(text));
Set stopTypes = new HashSet();
st
Hello,
I figured out how to solve this. I just added stopTypes.add("");
On Wed, Jun 12, 2013 at 8:39 PM, Gucko Gucko wrote:
> Hello all,
>
> is there a filter I can use to remove emails from a TokenStream?
>
> so far I'm using this to remove numbers, URls, and I