Hello,
> 21 jan 2008 kl. 16.37 skrev Ard Schrijvers:
>
> > is there a way to reuse a Lucene document which was indexed and
> > analyzed before, but only one single Field has changed?
> Karl Wetting wrote:
> I don't think you can reuse document instances like that, you
> could however pre-token
Forget all I said! I managed to answer a question that was not there! :)
If you have the term vectors stored it is fairly quick to re-assemble
a token stream from the document using a TermVectorMapper. Otherwise
it will be really slow.
--
karl
22 jan 2008 kl. 08.04 skrev Karl Wettin:
2
21 jan 2008 kl. 16.37 skrev Ard Schrijvers:
is there a way to reuse a Lucene document which was indexed and
analyzed
before, but only one single Field has changed?
I don't think you can reuse document instances like that, you could
however pre-tokenize them fields that will stay the same
See BooleanQuery.setMinimumNumberShouldMatch.
Add the addresses as "SHOULD" termQuery clauses and set
minumumNumberShouldMatch to the required value.
Cheers
Mark
- Original Message
From: Michael Prichard <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, January 21, 200
Why not just design your system to roll over to a new index on a weekly a basis
(new IndexWriter on a new index dir, roughly speaking)? You can't partition a
single Document, if that is what you are asking. But you can create multiple
smaller (e.g. weekly indices) instead one large one, and th
Why not just design your system to roll over to a new index on a weekly a basis
(new IndexWriter on a new index dir, roughly speaking)? You can't partition a
single Document, if that is what you are asking. But you can create multiple
smaller (e.g. weekly indices) instead one large one, and th
I think you'll have to go with MoreLikeThis (assuming your emails as tokenized
suitably) and go through matches yourself to check for the % match.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Michael Prichard <[EMAIL PROTECTED]>
To: java-use
vivek sar wrote:
I need to be able to sort on optime as well, thus need to store it .
Lucene's default sorting does not need the field to be stored, only indexed as
untokenized.
Antony
-
To unsubscribe, e-mail: [EMAIL PRO
On Montag, 21. Januar 2008, Fabrice Robini wrote:
> I've tried the "fair" similarity described here
> (http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739)
> with lucene 2.2 but it does not seems to work.
What exactly doesn't work, don't you see an effect? At least the scores
s
Say I have a field of To addresses from an email archive. I do a search and I
get 10 To addresses for a single hit. Then I want to find similar email with
the To addresses containing roughly 75% of those email addresses as well. How
would I do this?
In other words:
I get a result with:
To:
Hi,
As a requirement I need to be able to archive any indexes older than
2 weeks (due to space and performance reasons). That means I would
need to maintain weekly indexes. Here are my questions,
1) What's the best way to partition indexes using Lucene?
2) Is there a way I can partition document
Hi all,
I'm starting in the process of creating Hebrew support for Lucene.
Specifically I'm using Clucene (which is an awesome and strong port), but
that shouldn't matter for my questions. Please, if you know of any info or
similar project let me know, it can save me loads of time and headaches.
Hi Toke,
what kind of queries are you using for your tests? (num query terms,
booleans clauses, phrases, wildcards?)
-Michael
Yonik Seeley wrote:
> On Jan 21, 2008 10:32 AM, Toke Eskildsen <[EMAIL PROTECTED]> wrote:
>> If we
>> only look at the forst 50.000 queries, the difference in speed for
On Jan 21, 2008 10:32 AM, Toke Eskildsen <[EMAIL PROTECTED]> wrote:
> If we
> only look at the forst 50.000 queries, the difference in speed for
> Lucene versions using harddisks is negligible. For SSDs it's quite
> visible:
Hmmm, I have a hard time thinking what could have slowed down
searching..
Hi,
compass (http://www.opensymphony.com/compass/content/lucene.html) promisses
many nice things in my opinion.
Has anybody production experiences with it?
Especially Jdbc Directory and Updates?
Thank you.
-
To unsubscribe, e-
> Genau! Indices are simply merged on disk, their content is
> not re-analyzed.
Thank you!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi,
I've tried the "fair" similarity described here
(http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739) with
lucene 2.2 but it does not seems to work.
I've attached the custom "MyFair" similarity to both IndexWriter and
IndexSearcher.
Do you have any idea ?
Thanks a lot,
F
Hello,
is there a way to reuse a Lucene document which was indexed and analyzed
before, but only one single Field has changed? The use case (Jackrabbit
indexing) is when a *lot* of documents have a common field which
changes, and the rest of the document is unchanged . I would guess that
there is
On Mon, 2008-01-21 at 08:32 -0500, Michael McCandless wrote:
> Well that is not good news!! From your results below, it looks like
> 2.3 searching is 13.6% slower with hard disks and 8.9% slower with SSD.
As can be seen, it depends on the configuration. But the overall picture
is very consisten
Yes, I noticed
http://www.archivum.info/[EMAIL PROTECTED]/2006-09/msg00065.html
Somehow I gotta do my delete within the same writer. I could use another
field that combines both src and dst field, and use this field without
storing but still a waste of resources.
I wonder if IndexWriter can be mo
Thanks Michael,
> Right, if you disable it (as above), it won't flush by count but
> rather by RAM.
I had made a test case monitoring ram usage and never flushing manually -
(with disabled autoflush)
and I think it wont flush itself when it reaches a certain buffered ram.
Having read the source
You will have to close the IndexWriter.
Only one "writer" may be open at once on an index, where "writer"
includes an IndexReader that has done some deletes (the first time
you delete a document using a reader, it will acquire the write.lock,
which will fail if you have another writer open
Hello Michael;
how can I construct a chain where both reader and writer at the same state?
You can call getIndexReader method of the IndexSearcher. But when I delete
documents through the reader, how will this interact with the writer?
I am have disabled autoflush and using my own logic to do flus
Cam Bazz wrote:
Hello,
When we delete documents from index - will it autoflush when count of
deleted documents reach a certain value. I am controlling my own flush
operation, and I have disabled autoflush by:
writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH);
By default (in 2.3) the
Hello,
When we delete documents from index - will it autoflush when count of
deleted documents reach a certain value. I am controlling my own flush
operation, and I have disabled autoflush by:
writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH);
But I have taken a peek at the IndexWriter
For this case, too, you will need to use an IndexReader, or use
IndexSearcher to run that particular search and then delete the
docIDs returned using the IndexReader.
Though, be sure to first iterate through all hits, gathering all
docIDs. And then in 2nd pass, do the deletions. Otherwi
Hello Mike;
How about deleting by a compount term?
for example if I have a document with two fields srcId and dstId
and I want to delete the document where srcId=1 and dstId=2
right now there exists a IndexWriter.deleteDocuments(Term t) but with that I
can only delete lets say where srcId=someth
Toke Eskildsen wrote:
On Sun, 2008-01-20 at 05:44 -0500, Michael McCandless wrote:
These results are very interesting. With 3 threads on SSD your
searches run 87% faster if you use 3 IndexSearchers instead of
sharing a single one.
That is my observation, yes. Please note that this is with L
On Sun, 2008-01-20 at 05:44 -0500, Michael McCandless wrote:
> These results are very interesting. With 3 threads on SSD your
> searches run 87% faster if you use 3 IndexSearchers instead of
> sharing a single one.
That is my observation, yes. Please note that this is with Lucene 2.1.
I've tr
Hi,
I've tried this "fair" similarity with lucene 2.2 but it does not seems to
work.
I've attached the custom "MyFair" similarity to bith IndexWriter and
IndexSearcher.
Do you have any idea ?
Thanks a lot,
Fabrice
Daniel Naber-5 wrote:
>
> Hi,
>
> as some of you may have noticed, Lucene p
30 matches
Mail list logo