> What is the best format/markup/ebook standard/document standard/other to use 
> for easiest and best text search support?

The helpful Tika libraries can parse any number of formats and then index the 
text into Lucene, so I'm thinking the question is what is the better format 
when you want to display the document.

It seems you need to ask what is a "document" as far as Lucene is concerned.    
Possibly the answer is each sentence (not the chapter), because I'm wondering 
if fundamentally the user wants to see each line and the references to other 
lines in this or other documents, but also view the whole document when needed.
So then you need
1.  A nice viewable version of each file (chapter).
2. Table(s) (in RDBS) that can cross-link every verse/sentence/line to every 
other.  Isn't that how cross references work?  At the sentence level?
3. Table(s) (in RDBS) that link each sentence to chapter to book to work (or 
alternatively some field(s) in Lucene that can be used to get to the definition 
of the context).
4. A Lucene index that indexes the "sentences" (the fundamental cross 
referencable subunit of the text).

Maybe someone else has ideas about mapping from text in a document to a 
particular verse and its cross references, but that sounds like a lot of 
mapping to me, so I think of doing the work up front and building the index of 
verses/sentences.
Just my beginners 0.02 cents worth.

-Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to