On Fri, 2014-06-27 at 12:33 +0200, Sandeep Khanzode wrote:
> I have an index that runs into 200-300GB. It is not frequently updated.
"not frequently" means different things for different people. Could you
give an approximate time span? If it is updated monthly, you might
consider a full optimizati
Some points based on my experience.
You can think of SolrCloud implementation, if you want to distribute your
index over multiple servers.
Use MMapDirectory locally for each Solr instance in cluster.
Hit warm-up query on sever start-up. So most of the documents will be
cached, you will start sav
Hi,
I have an index that runs into 200-300GB. It is not frequently updated.
What are the best strategies to query on this index?
1.] Should I, at index time, split the content, like a hash based partition,
into multiple separate smaller indexes and aggregate the results
programmatically?
2.] Sh
On Mon, Aug 1, 2011 at 8:04 AM, Simon Willnauer
wrote:
> On Mon, Aug 1, 2011 at 12:57 AM, kiwi clive wrote:
>> Hi Mike,
>>
>> The problem was due to close(). A shutdown was calling close() which seems
>> to cause lucene to perform a merge. For a busy very large index (with lots
>> of deletes a
flushed segment(s).
simon
>
> Clive
>
>
>
> - Original Message -
> From: Michael McCandless
> To: java-user@lucene.apache.org
> Cc:
> Sent: Tuesday, July 26, 2011 5:30 PM
> Subject: Re: Closing IndexWriter can be very slow on large indexes
>
> Which me
on large indexes
Which method (abort or close) do you see taking so much time?
It's odd, because IW.abort should quickly stop any running BG merges.
Can you get a dump of the thread stacks during this long abort/close
and post that back?
Can't answer if Lucene 3.x will improve this
Which method (abort or close) do you see taking so much time?
It's odd, because IW.abort should quickly stop any running BG merges.
Can you get a dump of the thread stacks during this long abort/close
and post that back?
Can't answer if Lucene 3.x will improve this situation until we find
the so
Hi
I think I must be doing something wrong, but not sure what.
I have some long running indexing code which sometimes needs to be shutdown in
a hurry. To achieve this, I set a shutdown flag which causes it to break from
the loop and call first abort() and then close(). The problem is that w
Hmm can you double-check your Lucene version? SerialMergeScheduler
wasn't added until 2.3, so you are at least at that version.
It looks like you are using SerialMergeScheduler, which, by design,
can only do one merge at a time (this is why you see the threads
BLOCKED). You can try switching to
Simply breaking up your index into separate pieces on the same machine
buys you nothing, in fact it costs you considerably. Have you put
a profiler on the system to see what's happening? I expect you're swapping
all over the place and are memory-constrained.
Have you considered sharding your index
On Fri, Jul 8, 2011 at 4:50 PM, Ian Lea wrote:
> There are lots of general tips at
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.
>
> What version of lucene? Recent releases should be faster. Have you
> tried with one big index? If everything is running on the same server
> that may
There are lots of general tips at
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.
What version of lucene? Recent releases should be faster. Have you
tried with one big index? If everything is running on the same server
that may well be faster.
Even on single indexes, response of a few
Hi
I was wondering how to improve search performance over a set of indexes like
this:
27GK1-1/index
19GK1-2/index
24GK1-3/index
15GK1-4/index
19GK1-5/index
31GK2-1/index
16GK2-2/index
8.1G K2-3/index
12GK2-4/index
15GK2-5/index
In total it is
Super!
Mike
On Fri, Oct 16, 2009 at 4:06 AM, Shaun Senecal wrote:
> Thanks Mike. The queries are now running faster than they ever were before,
> and are returning the expected results!
>
>
> On Fri, Oct 16, 2009 at 7:39 AM, Shaun Senecal wrote:
>
>> Ah! I thought that the ConstantScoreQuery w
Thanks Mike. The queries are now running faster than they ever were before,
and are returning the expected results!
On Fri, Oct 16, 2009 at 7:39 AM, Shaun Senecal wrote:
> Ah! I thought that the ConstantScoreQuery would also be rewritten into a
> BooleanQuery, resulting in the same exception.
Ah! I thought that the ConstantScoreQuery would also be rewritten into a
BooleanQuery, resulting in the same exception. If that's the case, then
this should work. I'll give that a try when I get into the office this
morning.
On Fri, Oct 16, 2009 at 6:46 AM, Michael McCandless <
luc...@mikemcca
Well, you could wrap the C | D filter as a Query (using
ConstantScoreQuery), and then add that as a SHOULD clause on your
toplevel BooleanQuery?
Mike
On Thu, Oct 15, 2009 at 5:42 PM, Shaun Senecal wrote:
> At first I thought so, yes, but then I realised that the query I wanted to
> execute was A
At first I thought so, yes, but then I realised that the query I wanted to
execute was A | B | C | D and in reality I was executing (A | B) & (C | D).
I guess my unit tests were missing some cases and don't currently catch
this.
On Thu, Oct 15, 2009 at 11:59 PM, Michael McCandless <
luc...@mikem
You should be able to do exactly what you were doing on 2.4, right?
(By setting the rewrite method).
Mike
On Thu, Oct 15, 2009 at 8:30 AM, Shaun Senecal wrote:
> Thanks for the explanation Mike. It looks like I have no choice but to move
> any queries which throw TooManyClauses to be Filters. S
Thanks for the explanation Mike. It looks like I have no choice but to move
any queries which throw TooManyClauses to be Filters. Sadly, this means a
max query time of 6s under load unless I can find a way to rewrite the query
to span a Query and a Filter.
Thanks again
On Thu, Oct 15, 2009 at
On Thu, Oct 15, 2009 at 4:57 AM, Shaun Senecal wrote:
> Up to Lucene 2.4, this has been working out for us. However, in
> Lucene 2.9 this breaks since rewrite() now returns a
> ConstantScoreQuery.
You can get back to the 2.4 behavior by calling
prefixQuery.setRewriteMethod(prefixQuery.SCORING_B
Sorry for the double post, but I think I can clarify the problem a little
more.
We want to execute:
query: A | B | C | D
filter: null
However, C and D cause TooManyClauses, so instead we execute:
query: A | B
filter: C | D
My understanding is that Lucene will apply the Filter (C
I know this has been discussed to great length, but I still have not found a
satisfactory solution and I am hoping someone on the list has some ideas...
We have a large index (4M+ Documents) with a handful of Fields. We need to
perform PrefixQueries on multiple fields. The problem is that when t
Mike, thanks very much for your comments! I won't have time to try these
ideas for a little while but when I do I'll definitely post the results.
On Fri, Aug 7, 2009 at 12:15 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Thu, Aug 6, 2009 at 5:30 PM, Nigel wrote:
> >> Actually I
On Thu, Aug 6, 2009 at 5:30 PM, Nigel wrote:
>> Actually IndexWriter must periodically flush, which will always
>> create new segments, which will then always require merging. Ie
>> there's no way to just add everything to only one segment in one
>> shot.
>>
>
> Hmm, that makes sense now that you
On Wed, Aug 5, 2009 at 3:50 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Wed, Aug 5, 2009 at 12:08 PM, Nigel wrote:
> > We periodically optimize large indexes (100 - 200gb) by calling
> > IndexWriter.optimize(). It takes a heck of a long time, and I'm
On Wed, Aug 5, 2009 at 12:08 PM, Nigel wrote:
> We periodically optimize large indexes (100 - 200gb) by calling
> IndexWriter.optimize(). It takes a heck of a long time, and I'm wondering
> if a more efficient solution might be the following:
>
> - Create a new empty
We periodically optimize large indexes (100 - 200gb) by calling
IndexWriter.optimize(). It takes a heck of a long time, and I'm wondering
if a more efficient solution might be the following:
- Create a new empty index on a different filesystem
- Set a merge policy for the new index so it
Shai,
Thanks for the tip. I´ll start with it.
Alex
Shai Erera wrote:
Hi Alex,
You can start with this article:
http://www.manning.com/free/green_HotBackupsLucene.html (you'll need to
register w/ your email). It describes how one can write Hot Backups w/
Lucene, and capture just the "delta" si
This might be irrelevant, but have you considered using ZFS? This file
system is designed to do what you need. Assuming you can trigger
events at the time after you have updated the index, you would have to
trigger new ZFS snapshot and place it elsewhere.
This might have some side effects though (
Hi Alex,
You can start with this article:
http://www.manning.com/free/green_HotBackupsLucene.html (you'll need to
register w/ your email). It describes how one can write Hot Backups w/
Lucene, and capture just the "delta" since the last backup.
I'm about to try it myself, so if you get to do it b
Hi All,
We have a system with a lucene index with 100GB and growing fast. I
wonder whether
there is an efficient way to backup it taking into account only the
changes among old
and new version of the index, once after optimization process the name
of the main
index file change.
Regards,
Ale
t;
To: java-user@lucene.apache.org
Sent: Thursday, May 24, 2007 1:31:49 PM
Subject: Improving Search Performance on Large Indexes
Hello,
Currently we are attempting to optimize the search time against an index
that is 26 GB in size (~35 million docs) and I was wondering what
experiences others have had in
, 2007 1:09 PM
To: java-user@lucene.apache.org
Subject: Re: Improving Search Performance on Large Indexes
Hi Scott,
I met the same situation as you(index 100M documents). If the computer
has only one CPU and one disk, ParallelMultiSearcher is slower than
MultiSearcher.
I wrote an email "Wh
. . . . . .
> Simpy -- http://www.simpy.com/ - Tag - Search - Share
>
> - Original Message
> From: Scott Sellman <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Thursday, May 24, 2007 1:31:49 PM
> Subject: Improving Search Performance on
a-user@lucene.apache.org
Sent: Thursday, May 24, 2007 1:31:49 PM
Subject: Improving Search Performance on Large Indexes
Hello,
Currently we are attempting to optimize the search time against an index
that is 26 GB in size (~35 million docs) and I was wondering what
experiences others have had in s
Hello,
Currently we are attempting to optimize the search time against an index
that is 26 GB in size (~35 million docs) and I was wondering what
experiences others have had in similar attempts. Simple searches
against the index are still fast even at 26GB, but the problem is our
application
On Nov 13, 2005, at 8:19 PM, Friedland, Zachary (EDS - Strategy) wrote:
What is the largest lucene index that has been built? We're looking to
build a sort of data warehouse that will hold transaction log files as
long as possible. This index would grow at the rate of 10 million
documents per m
Largest index? Who knows! :)
Lucene's internal limit is the size of the doc Id (max Integer).
People typically roll their indices when they reach a certain size, but
if you don't need your queries to be fast and always need all the data,
then this may not make sense for you (well, it still may, a
What is the largest lucene index that has been built? We're looking to
build a sort of data warehouse that will hold transaction log files as
long as possible. This index would grow at the rate of 10 million
documents per month indefinitely. Is there a limit where lucene will
fail? What should
Well, by changing your query, you are changing your criteria, so I
assume you also got different (less) results. That's one reason why
your query got faster.
If index size is the issue, and that Field1 consumes most of it, and
you are not using it in search (I don't see it in your sample query),
e
Hi,
I am testing the speed of searching Lucene indexes. The index is of
the larger size! It has about 500,000 documents, about 60 fields with
1 field (Field1) containing the body of the document. Total index
size is currently about 20Gb
Testing the search i get this behaviour
(Field2:1) AND (F
Scott Smith wrote:
I have the need to create an index which will potentially have a
million+ documents. I know Lucene can accomplish this. However, the
other requirement is that I need to be continually updating it during
the date (adding 1-30 documents/minute).
Have a look at this thread:
http:/
I have the need to create an index which will potentially have a
million+ documents. I know Lucene can accomplish this. However, the
other requirement is that I need to be continually updating it during
the date (adding 1-30 documents/minute). I guess I had thought that I
might try to have an ac
44 matches
Mail list logo