The problem with this approach is that the Analyser you will use for indexing 
will be *very* different from the one used for searching.

The way I see it, the Document objects pqssed to Lucene should contain fields 
that are as much text based as possible, comparable to what a user would type 
while searching.   It's the task of the Analyzer then to break the text up in 
terms, remove capitals, etc, etc... This should be kept as similar as possible 
for indexing and searching.

IMHO, only fields that are not Tokenized (like dates or keywords) or fields 
that are UnIndexed should contain 'raw' data.

Luc


-----Original Message-----
From: Paul Libbrecht [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 19, 2005 11:44 PM
To: java-user@lucene.apache.org
Subject: Re: Passing XML objects to the analyzer ?


Le 19 avr. 05, à 22:50, Erik Hatcher a écrit :
> The only catch that I know if is that an Analyzer is invoked on a 
> per-field basis.  I can't tell exactly what you have in mind, but a 
> Lucene Analyzer cannot split data into separate fields itself - it has 
> to have been split prior.

That's an easy one... ok, yes, I was clearly aware of this.

> I'm indexing a lot of XML myself, with JDOM in the middle, and using 
> XPath to extract data per field before building the Document.

So wouldn't Field.Unstored(Object) actually make sense ?
That object, instead of being a reader, would be passed around till the 
analyzer call which would then decide to accept, say, JDOM objects...

paul


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to