Hi Matt,
Thanks for your answer,
I'm new to lucene, so I don't know what should I know about that.
I find a reference about discuss searching substring and it work good
for me,
I'm not sure what analyer we used, I'll check it out and make sure why
it work for us.
thank you ve
Hi Matt,
Thanks for your answer,
I'm new to lucene, so I don't know what should I know about that.
I find a reference about discuss searching substring and it work good
for me,
I'm not sure what analyer we used, I'll check it out and make sure why
it work for us.
thank you ve
a code snippet is worth 1000 words :)
private static final Term UID_TERM = new Term("uid_payload", "_UID");
private static class SinglePayloadTokenStream extends TokenStream {
private Token token = new Token(UID_TERM.text(), 0, 0);
private byte[] buffer = new byte[4];
private boolean
All:
We are using java lucene 2.3.2 to index a fairly large number of documents
(roughly 400,000 per day). We have divided the time history into various depths.
Our first stage covers 8 days and our next stage covers 22. The index directory
for the first stage is approximately 20G when fully op
On Wed, Apr 1, 2009 at 5:22 PM, John Wang wrote:
> Hi Michael:
>
> 1) Yes, we use TermDocs, exactly what IndexWriter.deleteDocuments(Term)
> is doing under the cover.
This part I understand :)
> 2) We iterate the docid->uid mapping, for each docid, get the
> corresponding ui and check that
Hi Michael:
1) Yes, we use TermDocs, exactly what IndexWriter.deleteDocuments(Term)
is doing under the cover.
2) We iterate the docid->uid mapping, for each docid, get the
corresponding ui and check that to see if that is in the deleted set. If so,
add the docid to the list. There is no ui
On Wed, Apr 1, 2009 at 2:04 PM, John Wang wrote:
> My test essentially this. I took out the reader.deleteDocuments call from
> both scenarios. I took a index of 5m docs. a batch of 1 randomly
> generated uids.
>
> Compared the following scenarios:
> 1)
> * open index reader
> * for each uid i
Thanks Michael for the info.
I do guarantee there are not modifications between when
"MySpecialIndexReader" is loaded and when I iterate and find the deleted
docids. I am, however, not aware that when IndexWriter is opened, docids
move. I thought only when docs are added and when it is committed.
John,
We looked at implementing delete by doc id for LUCENE-1516, however it
seemed to be something that if enough people wanted we could implement it at
as a later patch.
The implementation involves maintaining a genealogy of SegmentReaders within
IndexWriter so that deletes to a reader that has
Think about putting this query in Luke and doing an "explain" for details,
but
I'm surprised this is working at all without throwing TooManyClauses errors.
Under the covers, Lucene expands your wildcards to all terms in the field
that match. For instance, assume your document field has the fol
Hi All,
I have the following query on a 1GB index with about 12 million docs :
As you can see the terms consist of wildcards...
query.toString()=+(+content:g* +content:h* +content:d* +content:s* +content:a*
+content:w* +content:b* +content:c* +content:m* +content:e*) +((+sender:cpuser9
+viewer
> For me at lease, IndexWriter.deleteDocument(int) would be useful.
I completely agree: delete-by-docID in IndexWriter would be a great
feature. Long ago I became convinced of that.
Where this feature always gets stuck (search the lists -- it's gotten
stuck alot) is how to implement it? At any
On Wed, Apr 1, 2009 at 4:02 AM, Michael McCandless
wrote:
> I think this has the same problem as exposing delete by docID, ie, how
> would you produce that docIdSet?
Whoops, right. I was going by memory that there was a
get(IndexReader) type method there... but that's on Filter of course.
-Yon
Hi Michael:
Let me first share what I am doing w.r.t deleting by docid:
I have a customized index reader that stores a mapping of docid -> uid in
the payload (something Michael Bush and Ning Li suggested a while back) And
that mapping is loaded a IndexReader load time and is shared by searche
Which analyzer are you using here? Depending on your choice the comma
separated values might be being kept together in your index, rather than
tokenized as you expected.
Secondly, you should get Luke, and take a look into your index, this
should give you a much better idea of what's going on
On Fri, 2009-03-27 at 12:07 +0100, Paul Taylor wrote:
[2Gb index, 7 million documents(?)]
> I ran the test a number of times with 30 threads, and max memory of
> 3500mb I was processing 10,000 records in about 43 seconds ( 233
> queries/second) , the index was stored on a solid state drive runn
John,
I think this has the same problem as exposing delete by docID, ie, how
would you produce that docIdSet?
We could consider delete by Filter instead, since that exposes the
necessary getDocIdSet(IndexReader) method.
Or, with near real-time search, we could enhance it to allow deletions
via t
17 matches
Mail list logo