Re: Indexing sections of TEI XML files

2008-08-13 Thread Tricia Williams
Hi, Take a look at what I've done with SOLR-380 (https://issues.apache.org/jira/browse/SOLR-380). The part you might find particularly useful is the Tokenizer. Tricia [EMAIL PROTECTED] wrote: Dear users, Question on approaches to indexing TEI XML or similar section/subsectioned files.

Re: Indexing sections of TEI XML files

2008-08-13 Thread Karsten F.
ch,much more. >> >> Erik > -- View this message in context: http://www.nabble.com/Indexing-sections-of-TEI-XML-files-tp18958644p18964569.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: Indexing sections of TEI XML files

2008-08-13 Thread ao1
Thanks, Erik, but I'm developing this system from scratch as it has specific use cases including dealing with multiple languages including multiple forms of a specific minority language (Irish). I'm going to look at XTF anyway just to see how they managed it! Thanks, A. > Have you looked at XTF

Re: Indexing sections of TEI XML files

2008-08-13 Thread Erik Hatcher
Have you looked at XTF? It does what you're after and much,much more. Erik On Aug 13, 2008, at 4:03 AM, [EMAIL PROTECTED] wrote: Dear users, Question on approaches to indexing TEI XML or similar section/ subsectioned files. I'm indexi

Indexing sections of TEI XML files

2008-08-13 Thread ao1
Dear users, Question on approaches to indexing TEI XML or similar section/subsectioned files. I'm indexing TEI P4 XML files using Lucene 2.x. Currently, each TEI XML file corresponds to a Lucene document. I extract the data from each XML file using XPath expressions e.g. for the body text: "/TEI