On Apr 19, 2005, at 3:55 PM, Paul Libbrecht wrote:
Hi,
I am working on an index to search XML data in a fixed format that I master well...
The idea is that the XML content (which I have as JDOM object) actually carries the semantic which would be best converted directly into tokens by something like an analyzer. However, adding fields is done not using the result of the analysis (or a stream thereof) but using readers or strings.
I have two choices and would like to know what's the best:
- make the text passed to the analyzer a simple "instruction" which will fetch the XML objects and do the analysis there
- make a pre-analysis step which converts it into tokens of text which then my analyzer catches again.
I'd be more inclined for the first solution but I fear there's a catch.
Is there one ?
The only catch that I know if is that an Analyzer is invoked on a per-field basis. I can't tell exactly what you have in mind, but a Lucene Analyzer cannot split data into separate fields itself - it has to have been split prior.
I'm indexing a lot of XML myself, with JDOM in the middle, and using XPath to extract data per field before building the Document.
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]