tire segment files are rewritten every time. So it looks like our only
option
is to bail out when there's not enough space to duplicate the existing
index.
- Original Message ----
From: "Beard, Brian"
To: java-user@lucene.apache.org
Sent: Tue, August 24, 2010 8:19:52 AM
Sub
We had a situation where our index size was inflated to roughly double.
It took about a couple of months, but the size eventually dropped back
down, so it does seem to eventually get rid of the deleted documents.
With that said, in the future expungeDeletes will get called once a day
to better man
e this metaData information while inside the TokenFilter. I guess
this would be similar to adding column stride fields, but have multiple
ones at different positions in the document.
-Original Message-----
From: Beard, Brian [mailto:brian.be...@mybir.com]
Sent: Thursday, August 19, 2010 2:02 P
I'm using lucene 2.9.1.
I'm indexing documents which correspond to an ID.
Each field in the ID document is made up of data from all subId's.
(It's a requirement that searches must work across all subId's within an
ID).
They will be indexed and stored in some format similar to:
subId0Value0 subId0
Since FieldSortedHitQueue was deprecated in 3.0, I'm converting to the
new FieldValueHitQueue.
The trouble I'm having is coming up with a way to use FieldValueHitQueue
in a Collector so it is decoupled from a TopDocsCollector.
What I'd like to do is have a custom Collector that can add objects
ex
Thought I would report a performance increase noticed in migrating from
2.3.2 to 2.4.0.
Performing an iterated loop using termDocs & termEnums like below is
about 30% faster.
The example test set I'm running has about 70K documents to go through
and process (on a dual processor windows machine) w
A while ago someone posted a link to a project called XTF which does
this:
http://xtf.wiki.sourceforge.net/
The one problem with this approach still lurking for me (or maybe I
don't understand how to get around) is how to handle multiple terms
which "must" appear in the query, but are in non-overl
We index some documents which have an "all" field containing all of the
data which can be searched on.
One of the problems we're having is when this field is say 10Mbytes the
highlighter takes about a second to calculate the best fragments. The
search only takes 30 milliseconds. I've accomodated t
I played around with GC quite a bit in our app and found the following
java settings to help a lot (Used with jboss, but should be good for any
jvm).
set JAVA_OPTS=%JAVA_OPTS% -XX:MaxPermSize=512M -XX:+UseConcMarkSweepGC
-XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
While these set
35 AM, Beard, Brian <[EMAIL PROTECTED]>
wrote:
> I will try tweaking RAM, and check about autoCommit=false. It's on the
> future agenda to multi-thread through the index writer. The indexing
> time I quoted includes the document creation time which would
definitely
> improve w
performance feedback
This is great to hear!
If you tweak things a bit (increase RAM buffer size, use
autoCommit=false, use threads, etc) you should be able to eke out some
more gains...
Are you storing fields & using term vectors on any of your fields?
Mike
Beard, Brian wrote:
>
> I
I just did an update from lucene 2.2.0 to 2.3.2 and thought I'd give
some kudos for the indexing performance enhancements.
The lucene indexing portion is about 6-8 times faster. Previously we
were doing ~60-120 documents per second, now we're between 400-1000,
depending on the type of document, s
I just did an update from lucene 2.2.0 to 2.3.2 and thought I'd give
some kudos for the indexing performance enhancements.
The lucene indexing portion is about 6-8 times faster. Previously we
were doing ~60-120 documents per second, now we're between 400-1000,
depending on the type of document, si
I'm using lucene 2.2.0 & have two questions:
1) Should search times be linear wrt number of queries hitting a single
searcher? I've run multiple search threads against a single searcher,
and the search times are very linear - 10x slower for 10 threads vs 1
thread, etc. I'm using a paralle multi-
You can use your approach w/ or w/o the filter.
>td = indexSearcher.search(query, filter, maxnumhits);
You need to use a filter for the wildcards which is built in to the
query.
1) Extend QueryParser to override the getWildcardQuery method.
(Or even if you don't use QueryParser, j
AHA! That is consistent with what is happening now, and explains the
discrepancy.
The original post of parens around each term was because I was adding
them as separate boolean queries, but now with using just the clause the
parens is around the entire clause with the boost.
-Original Message
Thanks for all replies.
Today when I printed out the query that's generated it does not have the
extra paren's. And query.rewrite(reader).toString() now gives the same
result as query.toString(). All I can figure is I must have changed
something between starting the email and sending it out. The o
I'm using lucene 2.2.0.
I'm in the process of re-writing some queries to build BooleanQueries
instead of using query parser.
Bypassing query parser provides almost an order of magnitude improvement
for very large queries, but then the search performance takes 20-30%
longer. I'm adding boost valu
-
From: Antony Bowesman [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 10, 2008 3:19 PM
To: java-user@lucene.apache.org
Subject: Re: how do I get my own TopDocHitCollector?
Beard, Brian wrote:
> Ok, I've been thinking about this some more. Is the cache mechanism
> pulling from the cac
-----
From: Beard, Brian [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 10, 2008 10:08 AM
To: java-user@lucene.apache.org
Subject: RE: how do I get my own TopDocHitCollector?
Thanks for the post. So you're using the doc id as the key into the
cache to retrieve the external id. Then what mechan
Wednesday, January 09, 2008 7:19 PM
To: java-user@lucene.apache.org
Subject: Re: how do I get my own TopDocHitCollector?
Beard, Brian wrote:
> Question:
>
> The documents that I index have two id's - a unique document id and a
> record_id that can link multiple documents together that
Question:
The documents that I index have two id's - a unique document id and a
record_id that can link multiple documents together that belong to a
common record.
I'd like to use something like TopDocs to return the first 1024 results
that have unique record_id's, but I will want to skip some o
I had a similar problem (I think). Look at using a WildcardFilter
(below), possibly wrapped in a CachingWrapperFilter, depending if you
want to re-use it. I over-rode the method QueryParser.getWildcardQuery
to customize it. In your case you would probably have to specifically
detect for the presenc
ew BitSet();
with
= new BitSet(reader.maxDocs());
Beard, Brian wrote:
> Mark,
>
> Thanks so much.
>
> -Original Message-
> From: Mark Miller [mailto:[EMAIL PROTECTED]
> Sent: Friday, October 12, 2007 1:54 PM
> To: java-user@lucene.apache.org
> Subject: Re: Wildcard
.doc());
}
} else {
break;
}
} while (enumerator.next());
} finally {
termDocs.close();
enumerator.close();
}
return bits;
}
}
- Mark
Beard, Brian wrote:
> I'm trying to over-ride QueryParser.getW
I'm trying to over-ride QueryParser.getWildcardQuery to use filtering.
I'm missing something, because the following still gets the
maxBooleanClauses limit.
I guess the terms are still expanded even though the query is wrapped in
a filter. How do I avoid the term expansion altogether? Is there a
b
idable to provide a hook for
you to return a Query object of your choosing (e.g. ConstantScoreQuery
wrapping your choice of filter)
Cheers
Mark
- Original Message
From: "Beard, Brian" <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, 9 October, 2007 3:2
I'm currently using rangeFilter's and queryWrapperFilter's to get around
the max boolean clause limit.
A couple of questions concerning this:
1) Is it good design practice to substitue every term containing a
wildcard with a queryWrapperFilter, and a rangeQuery with a RangeFilter
and ChainedFilt
l now work fine, this has
not been heavily tested yet. Also note that performance over NFS is
generally not great. If you do go down this route please report back
on any success or failure! Thanks.
Mike
"Beard, Brian" <[EMAIL PROTECTED]> wrote:
>
> http://issues.apache.org/j
http://issues.apache.org/jira/browse/LUCENE-673
This says the NFS mount problem is still open, is this the case?
Has anyone been able to deal with this adequately?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
parser = new QueryParser();
parser.setAllowLeadingWildcard(true);
-Original Message-
From: Martin Spamer [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 21, 2007 7:06 AM
To: java-user@lucene.apache.org
Subject: All keys for a field
I need to return all of the keys for a
That works, thanks.
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Tuesday, June 19, 2007 9:57 AM
To: java-user@lucene.apache.org
Subject: Re: MultiSearcher holds on to index - optimization not one segment
On 6/19/07, Beard, Brian
: Tuesday, June 19, 2007 9:06 AM
To: java-user@lucene.apache.org
Subject: Re: MultiSearcher holds on to index - optimization not one segment
On 6/19/07, Beard, Brian <[EMAIL PROTECTED]> wrote:
> The problem I'm having is once the MultiSearcher is open, it holds on to
> t
We're using a MultiSearcher to search against multiple lucene indexes
which runs inside of a web application in jboss 4.0.4.
We're also using a standalone app running in a different jboss server
which gets periodic updates from an oracle database and updates the
lucene index.
Both the searcher a
I noticed in previous discussion about some index integrity detection
classes that were around in version 1.4 (NoOpDirectory or
NullDirectory). Does anyone know if this in the 2.1.0 release? I didn't
see in 2.1.0 or the contrib folders.
Brian Beard
35 matches
Mail list logo