Re: Passing XML objects to the analyzer ?

Paul Libbrecht Wed, 20 Apr 2005 12:50:58 -0700

I don't agree with this if the query is expected to contain the same "text-encoding" as the content being analyzed.

So one example would be matching f is continuous since it is the product of g and x |-> x^2 (in "email notation", we work with a semantic encoding) This combination of text and formula (which we represent as OpenMath) would be both something that could be queried as a phrase and something that would be a content. Of course, you're left with the query input which is a separate problem closer to the controls for content authoring anyways.

One natural effect of XML is the mark of a region as being a reference to a named entity such as could be done in HTML with <a href="def_continuous">continuous function</a> Here one would like to match both the "symbol" continuous (that is the reference def_continuous as well as the words continuous function. What I could read thus far about position-increment to offer alternatives seems to be limited to one word.

paul


Le 20 avr. 05, à 15:32, Vanlerberghe, Luc a écrit :

The problem with this approach is that the Analyser you will use for indexing will be *very* different from the one used for searching.

The way I see it, the Document objects pqssed to Lucene should contain fields that are as much text based as possible, comparable to what a user would type while searching. It's the task of the Analyzer then to break the text up in terms, remove capitals, etc, etc... This should be kept as similar as possible for indexing and searching.

IMHO, only fields that are not Tokenized (like dates or keywords) or fields that are UnIndexed should contain 'raw' data.
Luc
-----Original Message-----
From: Paul Libbrecht [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 19, 2005 11:44 PM
To: java-user@lucene.apache.org
Subject: Re: Passing XML objects to the analyzer ?
Le 19 avr. 05, à 22:50, Erik Hatcher a écrit :
The only catch that I know if is that an Analyzer is invoked on a
per-field basis.  I can't tell exactly what you have in mind, but a
Lucene Analyzer cannot split data into separate fields itself - it has
to have been split prior.
That's an easy one... ok, yes, I was clearly aware of this.
I'm indexing a lot of XML myself, with JDOM in the middle, and using
XPath to extract data per field before building the Document.
So wouldn't Field.Unstored(Object) actually make sense ? That object, instead of being a reader, would be passed around till the analyzer call which would then decide to accept, say, JDOM objects...
paul
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Passing XML objects to the analyzer ?

Reply via email to