On Thu, 2015-02-05 at 04:00 +0100, Heeheeya wrote:
> i am recently puzzled by performance problem using lucene while the
> search result set is large. do you have any advice?
Without any information, how are we to help you?
Start by reading
https://wiki.apache.org/solr/SolrPerformancePr
hi,
i am a fan of lucene.i am recently puzzled by performance problem using
lucene while the search result set is large. do you have any advice? as an web
application, the exceeding 120 seconds' response time is not acceptable.
Looking forward to your reply. thanks so much.
发自我的 i
hi,
i am a fan of lucene.i am recently puzzled by performance problem using
lucene while the search result set is large. do you have any advice? as an web
application, the exceeding 120 seconds' response time is not acceptable.
Looking forward to your reply. thanks so much.
发自我的 iPhone
On Thu, Mar 7, 2013 at 6:44 PM, Michael McCandless
wrote:
> This sounds reasonable (500 M docs / 50 GB index), though you'll need
> to test resulting search perf for what you want to do with it.
>
> To reduce merging time, maximize your IndexWriter RAM buffer
> (setRAMBufferSizeMB). You could als
On Thu, Mar 7, 2013 at 7:06 PM, Jan Stette wrote:
> Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size
> and segments-per-tier settings and see what that does.
>
> The time spent merging seems to be so great though, that I'm wondering if
> I'm actually better off doing the
Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size
and segments-per-tier settings and see what that does.
The time spent merging seems to be so great though, that I'm wondering if
I'm actually better off doing the indexing single-threaded. Am I right in
thinking that no me
This sounds reasonable (500 M docs / 50 GB index), though you'll need
to test resulting search perf for what you want to do with it.
To reduce merging time, maximize your IndexWriter RAM buffer
(setRAMBufferSizeMB). You could also increase the
TieredMergePolicy.setSegmentsPerTier to allow more se
I'm seeing performance problems when indexing a certain set of data, and
I'm looking for pointers on how to improve the situation. I've read the
very helpful performance advice on the Wiki and I am carrying on doing
experiment based on that, but I'd also ask for comments as to whether I'm
heading i
Hmmm, something is wrong range queries over many terms should
definitely be faster.
There are some other oddities in your results...
- the "consolidated index" shows to be slower 295ms vs 602ms... but
patch 1596 doesn't touch that code path (a single segment index).
- TEST2 (using searcher.sear
I am sorry,
but after applying this patch, the performance on my tests are worse than
those on lucene-2.9-dev trunk.
TEST1: using *filter.getDocIdSet(reader)*;
*Test *results* (Num docs = 2,940,738) using lucene-core-2.9-dev trunk**
1 Original index (12 collections * 6 months = 72 indexes)*
OK, I think this will improve the situation:
https://issues.apache.org/jira/browse/LUCENE-1596
-Yonik
http://www.lucidimagination.com
On Fri, Apr 10, 2009 at 1:47 PM, Michael McCandless
wrote:
> We never fully explained it, but we have some ideas...
>
> It's only if you iterate each term, and d
en
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Saturday, April 11, 2009 6:42 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: RangeFilter perfor
gt; Subject: Re: RangeFilter performance problem using MultiReader
>
> OK, I scanned all the e-mails in this thread so I may be way off base, but
> has anyone yet asked the basic question of whether the granularity of the
> dates is really necessary ?
>
> Raf and Roberto:
>
> It
er-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Michael McCandless [mailto:luc...@mikemccandless.com]
> > Sent: Saturday, April 11, 2009 4:03 PM
> > To: java-user@lucene.apache.org
> > Su
...@thetaphi.de
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Saturday, April 11, 2009 4:03 PM
> To: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
>
> Ahhh, OK, perhaps that expl
Ahhh, OK, perhaps that explains the sizable perf difference you're
seeing w/ optimized vs not. I'm curious to see the results of your
"merge each month into 1 index" test...
Mike
On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini
wrote:
> On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
> w
On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
wrote:
> Hmm then I'm a bit baffled again.
>
> Because, each of your "by month" indexes presumably has a unique
> subset of terms for the "date_doc" field? Meaning, a given "by month"
> index will have all date_doc corresponding to that month, a
Hmm then I'm a bit baffled again.
Because, each of your "by month" indexes presumably has a unique
subset of terms for the "date_doc" field? Meaning, a given "by month"
index will have all date_doc corresponding to that month, and a
different "by month" index would presumably have no overlap in t
On Sat, Apr 11, 2009 at 11:48 AM, Michael McCandless
wrote:
> On Sat, Apr 11, 2009 at 5:27 AM, Raf wrote:
>
[cut]
>
> You have readers from 72 different directories, but is each directory
> an optimized or unoptimized index?
Hi,
I'm Raffaella's collegue, and I'm the "indexer" while she is the "s
On Sat, Apr 11, 2009 at 5:27 AM, Raf wrote:
> I have repeated my tests using a searcher and now the performance on 2.9 are
> very better than those on 2.4.1, especially when the filter extracts a lot
> of docs.
OK, phew!
> However the same search on the consolidated index is even faster
This i
dex. This is not
> faster in 2.9.
>
> To compare speed, please use real search code (Searcher.search())!
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Raf [mailto:r.ventag...@gmail.com]
> > Sent: Saturday, April 11, 2009 9:07 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: RangeFilter performance problem using MultiReader
> >
>
o: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
>
> Thanks Uwe,
> I had already read about TrieRangeFilter on this mailing list and I
> thought
> it could be useful to solve my problem.
> I think I will trie it for test purposes.
eMail: u...@thetaphi.de
> -Original Message-
> From: Raf [mailto:r.ventag...@gmail.com]
> Sent: Saturday, April 11, 2009 9:07 AM
> To: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
>
> Ok, here you can find some d
> From: Raf [mailto:r.ventag...@gmail.com]
> > Sent: Friday, April 10, 2009 4:38 PM
> > To: java-user@lucene.apache.org
> > Subject: RangeFilter performance problem using MultiReader
> >
> > Hi,
> > we are experiencing some problems using RangeFilters and we t
No, it is a MultiReader that contains 72 (I am sorry, I wrote a wrong number
last time) "single" readers.
Raf
On Fri, Apr 10, 2009 at 9:14 PM, Mark Miller wrote:
> Raf wrote:
>
>>
>> We have more or less 3M documents in 24 indexes and we read all of them
>> using a MultiReader.
>>
>>
>
> Is this
Ok, here you can find some details about my tests:
*MultiReader creation*
IndexReader subReader;
List subReaders = new ArrayList();
for (Directory dir : this.directories) {
try {
subReader = IndexReader.open(dir, true);
subReaders.add(subReader);
} catch (...) {
Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Raf [mailto:r.ventag...@gmail.com]
> Sent: Friday, April 10, 2009 4:38 PM
> To: java-user@lucene.apache.org
> Subject: RangeFilter performance problem using MultiReader
>
> Hi,
>
On Fri, Apr 10, 2009 at 3:06 PM, Mark Miller wrote:
> 24 segments is bound to be quite a bit slower than an optimized index for
> most things
I'd be curious just how true this really is (in general)... my guess
is the "long tail of tiny segments" gets into the OS's IO cache (as
long as the syste
On Fri, Apr 10, 2009 at 3:11 PM, Mark Miller wrote:
> Mark Miller wrote:
>>
>> Michael McCandless wrote:
>>>
>>> which is why I'm baffled that Raf didn't see a speedup on
>>> upgrading.
>>>
>>> Mike
>>>
>>
>> Another point is that he may not have such a nasty set of segments - Raf
>> says he has 2
On Fri, Apr 10, 2009 at 3:14 PM, Mark Miller wrote:
> Raf wrote:
>>
>> We have more or less 3M documents in 24 indexes and we read all of them
>> using a MultiReader.
>>
>
> Is this a multireader containing multireaders?
Let's hear Raf's answer, but I think likely "yes". But this shouldn't
be a
Raf wrote:
We have more or less 3M documents in 24 indexes and we read all of them
using a MultiReader.
Is this a multireader containing multireaders?
--
- Mark
http://www.lucidimagination.com
-
To unsubscribe, e-mail
Mark Miller wrote:
Michael McCandless wrote:
which is why I'm baffled that Raf didn't see a speedup on
upgrading.
Mike
Another point is that he may not have such a nasty set of segments -
Raf says he has 24 indexes, which sounds like he may not have the
logarithmic sizing you normally see
Michael McCandless wrote:
which is why I'm baffled that Raf didn't see a speedup on
upgrading.
Mike
Another point is that he may not have such a nasty set of segments - Raf
says he has 24 indexes, which sounds like he may not have the
logarithmic sizing you normally see. If you have somewh
Michael McCandless wrote:
On Fri, Apr 10, 2009 at 2:32 PM, Mark Miller wrote:
I had thought we would also see the advantage with multi-term queries - you
rewrite against each segment and avoid extra seeks (though not nearly as
many as when enumerating every term). As Mike pointed out to me
On Fri, Apr 10, 2009 at 2:32 PM, Mark Miller wrote:
> I had thought we would also see the advantage with multi-term queries - you
> rewrite against each segment and avoid extra seeks (though not nearly as
> many as when enumerating every term). As Mike pointed out to me back when
> though : we st
When I did some profiling I saw that the slow down came from tons of
extra seeks (single segment vs multisegment). What was happening was,
the first couple segments would have thousands of terms for the field,
but as the segments logarithmically shrank in size, the number of terms
for the segme
On Fri, Apr 10, 2009 at 1:20 PM, Raf wrote:
> Hi Mike,
> thank you for your answer.
>
> I have downloaded lucene-core-2.9-dev and I have executed my tests (both on
> multireader and on consolidated index) using this new version, but the
> performance are very similar to the previous ones.
> The bi
On Fri, Apr 10, 2009 at 11:03 AM, Yonik Seeley
wrote:
> On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless
> wrote:
>> Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms
>> (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers.
>
> Do we know why this is, and
Hi Mike,
thank you for your answer.
I have downloaded lucene-core-2.9-dev and I have executed my tests (both on
multireader and on consolidated index) using this new version, but the
performance are very similar to the previous ones.
The big index is 7/8 times faster than multireader version.
Raf
On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless
wrote:
> Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms
> (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers.
Do we know why this is, and if it's fixable (the MultiTermEnum, not
the higher level query o
Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms
(Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers.
I think the only workaround is to merge your indexes down to a single
index.
But, Lucene trunk (not yet released) has fixed this, so that searching
through
Hi,
we are experiencing some problems using RangeFilters and we think there are
some performance issues caused by MultiReader.
We have more or less 3M documents in 24 indexes and we read all of them
using a MultiReader.
If we do a search using only terms, there are no problems, but it if we add
to
Hello Erick,
it's an average, definitely.
Best
Hans-Peter
- Original Message -
From: "Erick Erickson" <[EMAIL PROTECTED]>
To:
Sent: Saturday, January 12, 2008 2:20 PM
Subject: Re: Performance problem with IndexWriter
Is this just the first document or is
Is this just the first document or is it an average? I can imagine that
initialization happens at different times under different machines...
Best
Erick
On Jan 12, 2008 6:34 AM, Hans-Peter Stricker <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have a strange problem: the very same call executes 10 t
Hello,
I have a strange problem: the very same call executes 10 times faster under
Windows than under Linux: The line
writer.addDocument(doc)
takes (with the very same documents) < 1ms under Windows, but > 10ms under
Linux. maxBufferedDocs = 1, number of documents to index < 1,
flus
: Ben Dotte [mailto:[EMAIL PROTECTED]
Sent: Friday, November 03, 2006 4:48 PM
To: java-user@lucene.apache.org
Subject: Re: Intermittent search performance problem
Good suggestion, I tried watching the GCs in YourKit while testing but
unfortunately they don't seem to line up with the searches that
s not GC related issue, but generally does NOT guarantee it. GC
Log is the safest and fastest way to sort this kind of problems out).
Vlad
-Original Message-
From: Ben Dotte [mailto:[EMAIL PROTECTED]
Sent: Friday, November 03, 2006 4:48 PM
To: java-user@lucene.apache.org
Subject: Re: I
.
On 11/3/06, Ben Dotte <[EMAIL PROTECTED]> wrote:
I'm trying to figure out a way to troubleshoot a performance problem
we're seeing when searching against a memory-based index. What happens
is we will run a search against the index and it generally returns in
1 second or less. Bu
On 11/3/06, Ben Dotte <[EMAIL PROTECTED]> wrote:
I'm trying to figure out a way to troubleshoot a performance problem
we're seeing when searching against a memory-based index. What happens
is we will run a search against the index and it generally returns in
1 second or less. But
Hi,
I'm trying to figure out a way to troubleshoot a performance problem
we're seeing when searching against a memory-based index. What happens
is we will run a search against the index and it generally returns in
1 second or less. But every once in a while it takes 15-20 seconds for
It is indeed alot faster ...
Will use that one now ..
hits = searcher.search(query, new Sort(new
SortField(null,SortField.DOC,true)));
That is completing in under a sec for pretty much all the queries ..
On 8/22/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 8/21/06, M A <[EMAIL PROTECTE
On 8/21/06, M A <[EMAIL PROTECTED]> wrote:
I still dont get this, How would i do this, so i can try it out ..
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/SortField.html#SortField(java.lang.String,%20int,%20boolean)
new Sort(new SortField(null,SortField.DOC,true)
-Yonik
h
I still dont get this, How would i do this, so i can try it out ..
is
searcher.search(query, new Sort(SortField.DOC))
..correct this would return stuff in the order of the documents, so how
would i reverse this, i mean the later documents appearing fisrt ..
searcher.search(query, new Sort(???
On 8/21/06, M A <[EMAIL PROTECTED]> wrote:
Yeah I tried looking this up,
If i wanted to do it by document id (highest docs first) , does this mean
doing something like
hits = searcher.search(query, new Sort(new SortFeild(DOC, true); // or
something like that,
is this way of sorting any differe
Yeah I tried looking this up,
If i wanted to do it by document id (highest docs first) , does this mean
doing something like
hits = searcher.search(query, new Sort(new SortFeild(DOC, true); // or
something like that,
is this way of sorting any different performance wise to what i was doing
befo
On 8/20/06, M A <[EMAIL PROTECTED]> wrote:
The index is already built in date order i.e. the older documents appear
first in the index, what i am trying to achieve is however the latest
documents appearing first in the search results .. without the sort .. i
think they appear by relevance .. wel
public void search(Weight weight,
org.apache.lucene.search.Filterfilter, final HitCollector results)
throws IOException {
HitCollector collector = new HitCollector() {
public final void collect(int doc, float score) {
try {
Ok this is what i have done so far ->
static class MyIndexSearcher extends IndexSearcher {
IndexReader reader = null;
public MyIndexSearcher(IndexReader r) {
super(r);
reader = r;
}
public void search(Weight weight,
org.apache.lucene.search.
The index is already built in date order i.e. the older documents appear
first in the index, what i am trying to achieve is however the latest
documents appearing first in the search results .. without the sort .. i
think they appear by relevance .. well thats what it looked like ..
I am looking
Talk about mails crossing in the aether.. wrote my resonse before seeing
the last two...
Sounds like you're on track.
Erick
On 8/20/06, Erick Erickson <[EMAIL PROTECTED]> wrote:
About luke... I don't know about command-line interfaces, but if you copy
your index to a different machine and
About luke... I don't know about command-line interfaces, but if you copy
your index to a different machine and use Luke there. I do this between
Linux and Windows boxes all the time. Or, if you can mount the remote drive
so you can see it, you can just use Luke to browse to it and open it up. You
Just ran some tests .. it appears that the problem is in the sorting ..
i.e.
//hits = searcher.search(query, new Sort("sid", true));-> 17 secs
//hits = searcher.search(query, new Sort("sid", false)); -> 17 secs
hits = searcher.search(query);-> less than 1 sec ..
am trying something out
This is why a warming strategy like Solr takes is very valuable. The
searchable index is always serving up requests as fast as Lucene
works, which is achieved by warming a new IndexSearcher with searches/
sorts/filter creating/etc before it is swapped into use.
Erik
On Aug 20, 200
Ok I get your point, this still however means the first search on the new
searcher will take a huge amount of time .. given that this is happening now
..
i.e. new search -> new query -> get hits ->20+ secs .. this happens every 5
mins or so ..
although subsequent searches may be quicker ..
Am
: This is because the index is updated every 5 mins or so, due to the incoming
: feed of stories ..
:
: When you say iteration, i take it you mean, search request, well for each
: search that is conducted I create a new one .. search reader that is ..
yeah ... i ment iteration of your test. don'
yes there is a new searcher opened each time a search is conducted,
This is because the index is updated every 5 mins or so, due to the incoming
feed of stories ..
When you say iteration, i take it you mean, search request, well for each
search that is conducted I create a new one .. search read
: hits = searcher.search(query, new Sort("sid", true));
you don't show where searcher is initialized, and you don't clarify how
you are timing your multiple iterations -- i'm going to guess that you are
opening a new searcher every iteration right?
sorting on a field requires pre-computing a
what i am measuring is this
Analyzer analyzer = new StandardAnalyzer(new String[]{});
if(fldArray.length > 1)
{
BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD,
BooleanClause.Occur.SHOULD};
query = MultiFieldQueryP
This is a lnggg time, I think you're right, it's excessive.
What are you timing? The time to complete the search (i.e. get a Hits object
back) or the total time to assemble the response? Why I ask is that the Hits
object is designed to return the fir st100 or so docs efficiently. Every 10
Hi there,
I have an index with about 250K document, to be indexed full text.
there are 2 types of searches carried out, 1. using 1 field, the other using
4 .. for a query string ...
given the nature of the queries required, all stop words are maintained in
the index, thereby allowing for phrasa
70 matches
Mail list logo