RE: Reusing indexed and analyzed documents

2008-01-21 Thread Ard Schrijvers
Hello, > 21 jan 2008 kl. 16.37 skrev Ard Schrijvers: > > > is there a way to reuse a Lucene document which was indexed and > > analyzed before, but only one single Field has changed? > Karl Wetting wrote: > I don't think you can reuse document instances like that, you > could however pre-token

Re: Reusing indexed and analyzed documents

2008-01-21 Thread Karl Wettin
Forget all I said! I managed to answer a question that was not there! :) If you have the term vectors stored it is fairly quick to re-assemble a token stream from the document using a TermVectorMapper. Otherwise it will be really slow. -- karl 22 jan 2008 kl. 08.04 skrev Karl Wettin: 2

Re: Reusing indexed and analyzed documents

2008-01-21 Thread Karl Wettin
21 jan 2008 kl. 16.37 skrev Ard Schrijvers: is there a way to reuse a Lucene document which was indexed and analyzed before, but only one single Field has changed? I don't think you can reuse document instances like that, you could however pre-tokenize them fields that will stay the same

Re: Matching w/in X% ?

2008-01-21 Thread markharw00d
See BooleanQuery.setMinimumNumberShouldMatch. Add the addresses as "SHOULD" termQuery clauses and set minumumNumberShouldMatch to the required value. Cheers Mark - Original Message From: Michael Prichard <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, January 21, 200

Re: Archiving Index using partitions

2008-01-21 Thread Otis Gospodnetic
Why not just design your system to roll over to a new index on a weekly a basis (new IndexWriter on a new index dir, roughly speaking)? You can't partition a single Document, if that is what you are asking. But you can create multiple smaller (e.g. weekly indices) instead one large one, and th

Re: Archiving Index using partitions

2008-01-21 Thread Otis Gospodnetic
Why not just design your system to roll over to a new index on a weekly a basis (new IndexWriter on a new index dir, roughly speaking)? You can't partition a single Document, if that is what you are asking. But you can create multiple smaller (e.g. weekly indices) instead one large one, and th

Re: Matching w/in X% ?

2008-01-21 Thread Otis Gospodnetic
I think you'll have to go with MoreLikeThis (assuming your emails as tokenized suitably) and go through matches yourself to check for the % match. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Michael Prichard <[EMAIL PROTECTED]> To: java-use

Re: Using RangeFilter

2008-01-21 Thread Antony Bowesman
vivek sar wrote: I need to be able to sort on optime as well, thus need to store it . Lucene's default sorting does not need the field to be stored, only indexed as untokenized. Antony - To unsubscribe, e-mail: [EMAIL PRO

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-21 Thread Daniel Naber
On Montag, 21. Januar 2008, Fabrice Robini wrote: > I've tried the "fair" similarity described here > (http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739) > with lucene 2.2 but it does not seems to work. What exactly doesn't work, don't you see an effect? At least the scores s

Matching w/in X% ?

2008-01-21 Thread Michael Prichard
Say I have a field of To addresses from an email archive. I do a search and I get 10 To addresses for a single hit. Then I want to find similar email with the To addresses containing roughly 75% of those email addresses as well. How would I do this? In other words: I get a result with: To:

Archiving Index using partitions

2008-01-21 Thread vivek sar
Hi, As a requirement I need to be able to archive any indexes older than 2 weeks (due to space and performance reasons). That means I would need to maintain weekly indexes. Here are my questions, 1) What's the best way to partition indexes using Lucene? 2) Is there a way I can partition document

Lucene, HTML and Hebrew

2008-01-21 Thread Itamar Syn-Hershko
Hi all, I'm starting in the process of creating Hebrew support for Lucene. Specifically I'm using Clucene (which is an awesome and strong port), but that shouldn't matter for my questions. Please, if you know of any info or similar project let me know, it can save me loads of time and headaches.

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-21 Thread Michael Busch
Hi Toke, what kind of queries are you using for your tests? (num query terms, booleans clauses, phrases, wildcards?) -Michael Yonik Seeley wrote: > On Jan 21, 2008 10:32 AM, Toke Eskildsen <[EMAIL PROTECTED]> wrote: >> If we >> only look at the forst 50.000 queries, the difference in speed for

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-21 Thread Yonik Seeley
On Jan 21, 2008 10:32 AM, Toke Eskildsen <[EMAIL PROTECTED]> wrote: > If we > only look at the forst 50.000 queries, the difference in speed for > Lucene versions using harddisks is negligible. For SSDs it's quite > visible: Hmmm, I have a hard time thinking what could have slowed down searching..

Compass

2008-01-21 Thread spring
Hi, compass (http://www.opensymphony.com/compass/content/lucene.html) promisses many nice things in my opinion. Has anybody production experiences with it? Especially Jdbc Directory and Updates? Thank you. - To unsubscribe, e-

RE: IndexWriter#addIndexes

2008-01-21 Thread spring
> Genau! Indices are simply merged on disk, their content is > not re-analyzed. Thank you! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Is Fair Similarity working with lucene 2.2 ?

2008-01-21 Thread Fabrice Robini
Hi, I've tried the "fair" similarity described here (http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739) with lucene 2.2 but it does not seems to work. I've attached the custom "MyFair" similarity to both IndexWriter and IndexSearcher. Do you have any idea ? Thanks a lot, F

Reusing indexed and analyzed documents

2008-01-21 Thread Ard Schrijvers
Hello, is there a way to reuse a Lucene document which was indexed and analyzed before, but only one single Field has changed? The use case (Jackrabbit indexing) is when a *lot* of documents have a common field which changes, and the rest of the document is unchanged . I would guess that there is

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-21 Thread Toke Eskildsen
On Mon, 2008-01-21 at 08:32 -0500, Michael McCandless wrote: > Well that is not good news!! From your results below, it looks like > 2.3 searching is 13.6% slower with hard disks and 8.9% slower with SSD. As can be seen, it depends on the configuration. But the overall picture is very consisten

Re: delete a document from indexwriter

2008-01-21 Thread Cam Bazz
Yes, I noticed http://www.archivum.info/[EMAIL PROTECTED]/2006-09/msg00065.html Somehow I gotta do my delete within the same writer. I could use another field that combines both src and dst field, and use this field without storing but still a waste of resources. I wonder if IndexWriter can be mo

Re: Clarification about IndexWriter.deleteDocuments and flush.

2008-01-21 Thread Cam Bazz
Thanks Michael, > Right, if you disable it (as above), it won't flush by count but > rather by RAM. I had made a test case monitoring ram usage and never flushing manually - (with disabled autoflush) and I think it wont flush itself when it reaches a certain buffered ram. Having read the source

Re: delete a document from indexwriter

2008-01-21 Thread Michael McCandless
You will have to close the IndexWriter. Only one "writer" may be open at once on an index, where "writer" includes an IndexReader that has done some deletes (the first time you delete a document using a reader, it will acquire the write.lock, which will fail if you have another writer open

Re: delete a document from indexwriter

2008-01-21 Thread Cam Bazz
Hello Michael; how can I construct a chain where both reader and writer at the same state? You can call getIndexReader method of the IndexSearcher. But when I delete documents through the reader, how will this interact with the writer? I am have disabled autoflush and using my own logic to do flus

Re: Clarification about IndexWriter.deleteDocuments and flush.

2008-01-21 Thread Michael McCandless
Cam Bazz wrote: Hello, When we delete documents from index - will it autoflush when count of deleted documents reach a certain value. I am controlling my own flush operation, and I have disabled autoflush by: writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); By default (in 2.3) the

Clarification about IndexWriter.deleteDocuments and flush.

2008-01-21 Thread Cam Bazz
Hello, When we delete documents from index - will it autoflush when count of deleted documents reach a certain value. I am controlling my own flush operation, and I have disabled autoflush by: writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); But I have taken a peek at the IndexWriter

Re: delete a document from indexwriter

2008-01-21 Thread Michael McCandless
For this case, too, you will need to use an IndexReader, or use IndexSearcher to run that particular search and then delete the docIDs returned using the IndexReader. Though, be sure to first iterate through all hits, gathering all docIDs. And then in 2nd pass, do the deletions. Otherwi

Re: delete a document from indexwriter

2008-01-21 Thread Cam Bazz
Hello Mike; How about deleting by a compount term? for example if I have a document with two fields srcId and dstId and I want to delete the document where srcId=1 and dstId=2 right now there exists a IndexWriter.deleteDocuments(Term t) but with that I can only delete lets say where srcId=someth

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-21 Thread Michael McCandless
Toke Eskildsen wrote: On Sun, 2008-01-20 at 05:44 -0500, Michael McCandless wrote: These results are very interesting. With 3 threads on SSD your searches run 87% faster if you use 3 IndexSearchers instead of sharing a single one. That is my observation, yes. Please note that this is with L

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-21 Thread Toke Eskildsen
On Sun, 2008-01-20 at 05:44 -0500, Michael McCandless wrote: > These results are very interesting. With 3 threads on SSD your > searches run 87% faster if you use 3 IndexSearchers instead of > sharing a single one. That is my observation, yes. Please note that this is with Lucene 2.1. I've tr

Re: a "fair" similarity

2008-01-21 Thread Fabrice Robini
Hi, I've tried this "fair" similarity with lucene 2.2 but it does not seems to work. I've attached the custom "MyFair" similarity to bith IndexWriter and IndexSearcher. Do you have any idea ? Thanks a lot, Fabrice Daniel Naber-5 wrote: > > Hi, > > as some of you may have noticed, Lucene p