On 30 Nov 2005, at 10:53, Combs, Craig wrote:
I wrote my own analyzer:  When I view the tokens returned from the
StemFilter the words I'm searching for are returned from the "Token Next()" function. My terms are under 10,000. I have used luke before how can I make
it use a custom analyzer?  Thank you for the information.

.... <general code and default stop words> ....
        public final TokenStream tokenStream(String fieldName, Reader
reader) {
                TokenStream result = new StandardTokenizer( reader );
                result = new LowerCaseFilter( result );
                result = new StopFilter( result, stoptable );
                result = new ISOLatin1AccentFilter( result );
                result = new ItalianStemFilter( result, excltable );
                return result;
        }
}


If the tokens you're searching for are returned from your analyzer, then they will be searchable. If you're using QueryParser for searching, then you need to be sure your Query is being constructed as you'd like.

To get Luke to use a custom Analyzer, it should have a no-arg constructor, and be put in the classpath that you launch Luke from and it should appear in the list.

Hope this helps.

        Erik




-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 30, 2005 8:47 AM
To: java-user@lucene.apache.org
Subject: Re: Why are tokens not being indexed?

What Analyzer are you using?   Have a look at the Analyzer demo with
Lucene in Action's code, or from my java.net article so you can
analyze your analyzer.  Also try out Luke, it really is handy for
seeing inside your index.

And is your text really long (> 10,000 terms)?   If so, you'll need
to set the max. field length higher on IndexWriter (I always use
Integer.MAX_VALUE).

        Erik



On 30 Nov 2005, at 08:04, Combs, Craig wrote:

I have a body of text which is being added to a document as
unstored.  All
the words in the body text are coming through in the token stream for
analyzing.  For some reason I can search on some of tokens and
others I can
not.

Take the following string:

"L'amministrazione di Uniface View consente un elevato grado di
flessibilità
e può essere realizzata in base ai requisiti del proprio sito. Nel
manuale
Guida all'amministrazione di Uniface View  vengono descritte le
procedure di
amministrazione del sistema. Utilizzare le informazioni riportate
di seguito
per creare il portale più adatto alle proprie esigenze."

If I search for "consente" or "elevato" no results are found.  If I
search
for "view" or "grado" results are found.  I find an explanation for
this
behavior.  These tokens are being returned to the reader but don't
seem to
be making it into the index.

Any ideas on how to debugs or an explanation why this might be
happening?

Regards,
Craig Combs




The contents of this e-mail are intended for the named addressee
only. It
contains information that may be confidential. Unless you are the
named
addressee or an authorized designee, you may not copy or use it, or
disclose
it to anyone else. If you received it in error please notify us
immediately
and then destroy it.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately
and then destroy it.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to