No problem. Partly, it's helping me clarify my current problem <G>....
Yes, you must delete and re-add a document to change it. You might want to look at the IndexModifier class. Be aware of some things: 1> Lucene doc IDs may change when the index is changed, I think after optimization. So, in order to find specific docs you'll want to add a unique ID field. 2> deleteing a document only marks it as deleted. It's still actually in the index (although you won't be able to see it). Optimizing an index actually physically removes it (and shrinks the index size). 3> after changing an index, you must open a new IndexReader before you see the changes. 4> The only information you can reliably get out of an index is data you've stored (e.g. Field.Store.YES). This only matters if you're trying to read the doc *from* the index and change it. If you're just replacing it with a fresh copy from somewhere else (e.g. re-reading the doc from the database), it's not a problem. 5> Lucene has no notion of constraints. That is, Lucene is perfectly happy with two identical documents in the index, so be sure to delete before you add <G>.... Erick On 10/11/06, Eugeny N Dzhurinsky <[EMAIL PROTECTED]> wrote:
On Wed, Oct 11, 2006 at 11:30:03AM -0400, Erick Erickson wrote: > There's no real group_by functionality in Lucene. I'd have to ask, though, > "why organize your index this way"? I'm guessing that you're approaching > this from a database perspective, and if that's so, you may want to re-think > some things. Although see below for my contradicting myself..... > > Lucene excels as a text search engine, NOT a RDMBS. It's almost a sure bet > that when you find yourself trying to do DB like things in lucene, you > should either > 1> re-think how you use Lucene > 2> use a database or > 3> use a hybrid solution, using Lucene for your text searches and a DB for > the DB-like things you want to do. > > In your example (and I understand that you've perhaps simplified it enough > for brevity that the following is inapplicable), instead of indexing these > records, why not put all the text in a single field for each doc ID? e.g . > Document doc = new Document(); > doc.add(new Field("id", "10", ....); > doc.add(new Field("some_text", "some text here",.....)); > doc.add(new Field("some_text", "some another text here",......)); // NOTE, > the field is exactly the same as the previous line. > writer.write(doc); > > This will create one lucene document, with an id of 10, and text "some text > here some another text here". ( I left out the storage and indexing flags > above). > > Now, when you search your hits object will have one and only one entry for > doc ID 10. It'll have relevance scores, and should fix you right up. This > assumes that you're breaking your some_text up into tokens using the > appropriate tokenizer. > > > Note: it didn't occur to me until I'd used Lucene for some time, but > according to a discussion a while back, the above is exactly equivalent to > doc.add(new Field("id", "10", ....)); > doc.add(new Field("some_text", "some text here some another text here", > ......)); > writer.write(doc); > > Of course, how this applies to your paging issues is another story. I'm also > dealing with trying to get a mapping between offsets into a document and the > corresponding pages. It's interesting, especially when it comes to wildcard > queries, and I haven't found a satisfactory soulution yet. One "interesting" > issue if you choose to consider each page (record) as a lucene document is > how you deal with relevancy. That is, how do 10 hits on 3 pages of a 100 > page book rank compared to 25 hits on 15 pages of a 900 page book? Which is > "more relevant"? This may be completely irrelevant to your problem, but I'm > inferring that your records correspond to a page...... > > Eric Hatcher suggested re-casting all the queries into Span queries and then > using a Spans object. This, together with perhaps bumping the offsets of the > first term of each page by, say, 10.000 might work for me. I'll know more in > a day or two.... > > Hope this helps > Erick Thank you very juch for a prompt response! I really trying to apply full-text search to an existing database, so I might think in RDBMS domain way. In general, I didn't think of adding several fields with the same name to a Document object, which is to be idexed, this migh help me, but: * if something (for instance - a comment to a document) is changed, I need to change the index somehow * if some new comment added - i need to add it to a index record * if some comment is deleted - I need to remove it from an index record As far as I remember, Lucene doesn't allow to modify index records, so I need to delete the record and add it again, am I correct? -- Eugene N Dzhurinsky --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]