It all depends upon how you index it <G>.... There are at least three approaches.
1> each paragraph is a distinct Lucene document. You'd also index some data with each paragraph that allows you to reconstruct what book it came from. What relevance means on a per-book basis is a question you need to give some thought to, but searching on the text field will search all your paragraphs. This assumes that each lucene document indexes the paragraph in a field labeled "text". 2> You index all the paragraphs of a single book together in one lucene document. This really has two variants 2a> each paragraph gets it's own field in the document, i.e. fields like paragraph1, paragraph2. I wouldn't do this since the queries you have to construct are ugly. 2b> you index all the paragraphs into a single field per book, call it "text". Now, searching against the text field will search all the paragraphs you put in there. BTW, the following are equivalent (and note it's pseudo code) doc = new doc(); doc.add("text" "some text"); doc.add("text", "more stuff"); writer.add(doc); and doc = new doc(); doc.add("text", "some text more stuff"); writer.add(doc); You can play some games with the offsets of the first word of each paragraph if you need to know which paragraph the data was in (in the first form above only). Search the mail archive for PositionIncrementGap (?). The idea is that your text position for the last token in, say, paragraph 1 is 129. You can cause the first word of the next paragraph have an offset of 1,000, say. This has some interesting ramifications about whether you want, say, phrase searches to span pages or not, but that's another discussion.... And be aware that by default Lucene only indexes the first 10,000 tokens in a field. You can set this as high as you need to but you have to do it intentionally... Best Erick On 12/28/06, moraleslos <[EMAIL PROTECTED]> wrote:
I currently have a book containing content that is stored in the database by paragraph. For example, a book contains content with 5 paragraphs. Therefore each paragraph is stored as a distinct record in a database. In the object domain, I have a Book object which holds a java.util.List of Paragraph objects. In the relational world, this would be a One-to-Many for book-paragraph. Now, if I search for specific words against the Book's contents, will it retrieve all of the paragraphs, combine them and then do the search, or will it only search on a paragraph? For example, a "Guitar" book contains two paragraphs like this: paragraph 1: This is the first paragraph for learn paragraph 2: guitar and other musical instruments. Therefore there will be a record in the Book table linked with two records in the Paragraph table. Now say I index the book and paragraph fields as is and then have a lucene query that looks like this: [book:Guitar paragraph:"learn guitar"]. Will this query return a hit? Thanks in advance! -los -- View this message in context: http://www.nabble.com/newbie-lucene-indexing-search-question-tf2892417.html#a8080965 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]