Re: Searching on Large Indexes

2014-06-27 Thread Toke Eskildsen
On Fri, 2014-06-27 at 12:33 +0200, Sandeep Khanzode wrote: > I have an index that runs into 200-300GB. It is not frequently updated. "not frequently" means different things for different people. Could you give an approximate time span? If it is updated monthly, you might consider a full optimizati

Re: Searching on Large Indexes

2014-06-27 Thread Jigar Shah
Some points based on my experience. You can think of SolrCloud implementation, if you want to distribute your index over multiple servers. Use MMapDirectory locally for each Solr instance in cluster. Hit warm-up query on sever start-up. So most of the documents will be cached, you will start sav

Searching on Large Indexes

2014-06-27 Thread Sandeep Khanzode
Hi, I have an index that runs into 200-300GB. It is not frequently updated. What are the best strategies to query on this index? 1.] Should I, at index time, split the content, like a hash based partition, into multiple separate smaller indexes and aggregate the results programmatically? 2.] Sh

Re: Closing IndexWriter can be very slow on large indexes

2011-08-01 Thread Michael McCandless
On Mon, Aug 1, 2011 at 8:04 AM, Simon Willnauer wrote: > On Mon, Aug 1, 2011 at 12:57 AM, kiwi clive wrote: >> Hi Mike, >> >> The problem was due to close().  A shutdown was calling close() which seems >> to cause lucene to perform a merge. For a busy very large index (with lots >> of deletes a

Re: Closing IndexWriter can be very slow on large indexes

2011-08-01 Thread Simon Willnauer
flushed segment(s). simon > > Clive > > > > - Original Message - > From: Michael McCandless > To: java-user@lucene.apache.org > Cc: > Sent: Tuesday, July 26, 2011 5:30 PM > Subject: Re: Closing IndexWriter can be very slow on large indexes > > Which me

Re: Closing IndexWriter can be very slow on large indexes

2011-07-31 Thread kiwi clive
on large indexes Which method (abort or close) do you see taking so much time? It's odd, because IW.abort should quickly stop any running BG merges. Can you get a dump of the thread stacks during this long abort/close and post that back? Can't answer if Lucene 3.x will improve this

Re: Closing IndexWriter can be very slow on large indexes

2011-07-26 Thread Michael McCandless
Which method (abort or close) do you see taking so much time? It's odd, because IW.abort should quickly stop any running BG merges. Can you get a dump of the thread stacks during this long abort/close and post that back? Can't answer if Lucene 3.x will improve this situation until we find the so

Closing IndexWriter can be very slow on large indexes

2011-07-26 Thread Chris Bamford
Hi I think I must be doing something wrong, but not sure what. I have some long running indexing code which sometimes needs to be shutdown in a hurry. To achieve this, I set a shutdown flag which causes it to break from the loop and call first abort() and then close(). The problem is that w

Re: FW: Indexer Threads Getting Into BLOCKED State While Optimization Taking Place On Large Indexes Of Size > 2GB

2011-07-20 Thread Michael McCandless
Hmm can you double-check your Lucene version? SerialMergeScheduler wasn't added until 2.3, so you are at least at that version. It looks like you are using SerialMergeScheduler, which, by design, can only do one merge at a time (this is why you see the threads BLOCKED). You can try switching to

Re: Large indexes

2011-07-08 Thread Erick Erickson
Simply breaking up your index into separate pieces on the same machine buys you nothing, in fact it costs you considerably. Have you put a profiler on the system to see what's happening? I expect you're swapping all over the place and are memory-constrained. Have you considered sharding your index

Re: Large indexes

2011-07-08 Thread Simon Willnauer
On Fri, Jul 8, 2011 at 4:50 PM, Ian Lea wrote: > There are lots of general tips at > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. > > What version of lucene?  Recent releases should be faster. Have you > tried with one big index? If everything is running on the same server > that may

Re: Large indexes

2011-07-08 Thread Ian Lea
There are lots of general tips at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. What version of lucene? Recent releases should be faster. Have you tried with one big index? If everything is running on the same server that may well be faster. Even on single indexes, response of a few

Large indexes

2011-07-08 Thread Chris Bamford
Hi I was wondering how to improve search performance over a set of indexes like this: 27GK1-1/index 19GK1-2/index 24GK1-3/index 15GK1-4/index 19GK1-5/index 31GK2-1/index 16GK2-2/index 8.1G K2-3/index 12GK2-4/index 15GK2-5/index In total it is

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-16 Thread Michael McCandless
Super! Mike On Fri, Oct 16, 2009 at 4:06 AM, Shaun Senecal wrote: > Thanks Mike.  The queries are now running faster than they ever were before, > and are returning the expected results! > > > On Fri, Oct 16, 2009 at 7:39 AM, Shaun Senecal wrote: > >> Ah!  I thought that the ConstantScoreQuery w

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-16 Thread Shaun Senecal
Thanks Mike. The queries are now running faster than they ever were before, and are returning the expected results! On Fri, Oct 16, 2009 at 7:39 AM, Shaun Senecal wrote: > Ah! I thought that the ConstantScoreQuery would also be rewritten into a > BooleanQuery, resulting in the same exception.

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
Ah! I thought that the ConstantScoreQuery would also be rewritten into a BooleanQuery, resulting in the same exception. If that's the case, then this should work. I'll give that a try when I get into the office this morning. On Fri, Oct 16, 2009 at 6:46 AM, Michael McCandless < luc...@mikemcca

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Michael McCandless
Well, you could wrap the C | D filter as a Query (using ConstantScoreQuery), and then add that as a SHOULD clause on your toplevel BooleanQuery? Mike On Thu, Oct 15, 2009 at 5:42 PM, Shaun Senecal wrote: > At first I thought so, yes, but then I realised that the query I wanted to > execute was A

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
At first I thought so, yes, but then I realised that the query I wanted to execute was A | B | C | D and in reality I was executing (A | B) & (C | D). I guess my unit tests were missing some cases and don't currently catch this. On Thu, Oct 15, 2009 at 11:59 PM, Michael McCandless < luc...@mikem

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Michael McCandless
You should be able to do exactly what you were doing on 2.4, right? (By setting the rewrite method). Mike On Thu, Oct 15, 2009 at 8:30 AM, Shaun Senecal wrote: > Thanks for the explanation Mike.  It looks like I have no choice but to move > any queries which throw TooManyClauses to be Filters. S

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
Thanks for the explanation Mike. It looks like I have no choice but to move any queries which throw TooManyClauses to be Filters. Sadly, this means a max query time of 6s under load unless I can find a way to rewrite the query to span a Query and a Filter. Thanks again On Thu, Oct 15, 2009 at

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Michael McCandless
On Thu, Oct 15, 2009 at 4:57 AM, Shaun Senecal wrote: > Up to Lucene 2.4, this has been working out for us. However, in > Lucene 2.9 this breaks since rewrite() now returns a > ConstantScoreQuery. You can get back to the 2.4 behavior by calling prefixQuery.setRewriteMethod(prefixQuery.SCORING_B

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
Sorry for the double post, but I think I can clarify the problem a little more. We want to execute: query: A | B | C | D filter: null However, C and D cause TooManyClauses, so instead we execute: query: A | B filter: C | D My understanding is that Lucene will apply the Filter (C

PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
I know this has been discussed to great length, but I still have not found a satisfactory solution and I am hoping someone on the list has some ideas... We have a large index (4M+ Documents) with a handful of Fields. We need to perform PrefixQueries on multiple fields. The problem is that when t

Re: Efficient optimization of large indexes?

2009-08-11 Thread Nigel
Mike, thanks very much for your comments! I won't have time to try these ideas for a little while but when I do I'll definitely post the results. On Fri, Aug 7, 2009 at 12:15 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Thu, Aug 6, 2009 at 5:30 PM, Nigel wrote: > >> Actually I

Re: Efficient optimization of large indexes?

2009-08-07 Thread Michael McCandless
On Thu, Aug 6, 2009 at 5:30 PM, Nigel wrote: >> Actually IndexWriter must periodically flush, which will always >> create new segments, which will then always require merging.  Ie >> there's no way to just add everything to only one segment in one >> shot. >> > > Hmm, that makes sense now that you

Re: Efficient optimization of large indexes?

2009-08-06 Thread Nigel
On Wed, Aug 5, 2009 at 3:50 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Wed, Aug 5, 2009 at 12:08 PM, Nigel wrote: > > We periodically optimize large indexes (100 - 200gb) by calling > > IndexWriter.optimize(). It takes a heck of a long time, and I'm

Re: Efficient optimization of large indexes?

2009-08-05 Thread Michael McCandless
On Wed, Aug 5, 2009 at 12:08 PM, Nigel wrote: > We periodically optimize large indexes (100 - 200gb) by calling > IndexWriter.optimize(). It takes a heck of a long time, and I'm wondering > if a more efficient solution might be the following: > > - Create a new empty

Efficient optimization of large indexes?

2009-08-05 Thread Nigel
We periodically optimize large indexes (100 - 200gb) by calling IndexWriter.optimize(). It takes a heck of a long time, and I'm wondering if a more efficient solution might be the following: - Create a new empty index on a different filesystem - Set a merge policy for the new index so it

Re: [?? Probable Spam] Re: Backing up large indexes

2009-07-22 Thread Alexandre Leopoldo Gonçalves
Shai, Thanks for the tip. I´ll start with it. Alex Shai Erera wrote: Hi Alex, You can start with this article: http://www.manning.com/free/green_HotBackupsLucene.html (you'll need to register w/ your email). It describes how one can write Hot Backups w/ Lucene, and capture just the "delta" si

Re: Backing up large indexes

2009-07-22 Thread Mindaugas Žakšauskas
This might be irrelevant, but have you considered using ZFS? This file system is designed to do what you need. Assuming you can trigger events at the time after you have updated the index, you would have to trigger new ZFS snapshot and place it elsewhere. This might have some side effects though (

Re: Backing up large indexes

2009-07-22 Thread Shai Erera
Hi Alex, You can start with this article: http://www.manning.com/free/green_HotBackupsLucene.html (you'll need to register w/ your email). It describes how one can write Hot Backups w/ Lucene, and capture just the "delta" since the last backup. I'm about to try it myself, so if you get to do it b

Backing up large indexes

2009-07-22 Thread Alexandre Leopoldo Gonçalves
Hi All, We have a system with a lucene index with 100GB and growing fast. I wonder whether there is an efficient way to backup it taking into account only the changes among old and new version of the index, once after optimization process the name of the main index file change. Regards, Ale

Re: Improving Search Performance on Large Indexes

2007-05-24 Thread Sharad Agarwal
t; To: java-user@lucene.apache.org Sent: Thursday, May 24, 2007 1:31:49 PM Subject: Improving Search Performance on Large Indexes Hello, Currently we are attempting to optimize the search time against an index that is 26 GB in size (~35 million docs) and I was wondering what experiences others have had in

RE: Improving Search Performance on Large Indexes

2007-05-24 Thread Scott Sellman
, 2007 1:09 PM To: java-user@lucene.apache.org Subject: Re: Improving Search Performance on Large Indexes Hi Scott, I met the same situation as you(index 100M documents). If the computer has only one CPU and one disk, ParallelMultiSearcher is slower than MultiSearcher. I wrote an email "Wh

Re: Improving Search Performance on Large Indexes

2007-05-24 Thread Su.Cheng
. . . . . . > Simpy -- http://www.simpy.com/ - Tag - Search - Share > > - Original Message > From: Scott Sellman <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Thursday, May 24, 2007 1:31:49 PM > Subject: Improving Search Performance on

Re: Improving Search Performance on Large Indexes

2007-05-24 Thread Otis Gospodnetic
a-user@lucene.apache.org Sent: Thursday, May 24, 2007 1:31:49 PM Subject: Improving Search Performance on Large Indexes Hello, Currently we are attempting to optimize the search time against an index that is 26 GB in size (~35 million docs) and I was wondering what experiences others have had in s

Improving Search Performance on Large Indexes

2007-05-24 Thread Scott Sellman
Hello, Currently we are attempting to optimize the search time against an index that is 26 GB in size (~35 million docs) and I was wondering what experiences others have had in similar attempts. Simple searches against the index are still fast even at 26GB, but the problem is our application

Re: Large Indexes

2005-11-13 Thread Charles Lloyd
On Nov 13, 2005, at 8:19 PM, Friedland, Zachary (EDS - Strategy) wrote: What is the largest lucene index that has been built? We're looking to build a sort of data warehouse that will hold transaction log files as long as possible. This index would grow at the rate of 10 million documents per m

Re: Large Indexes

2005-11-13 Thread Otis Gospodnetic
Largest index? Who knows! :) Lucene's internal limit is the size of the doc Id (max Integer). People typically roll their indices when they reach a certain size, but if you don't need your queries to be fast and always need all the data, then this may not make sense for you (well, it still may, a

Large Indexes

2005-11-13 Thread Friedland, Zachary (EDS - Strategy)
What is the largest lucene index that has been built? We're looking to build a sort of data warehouse that will hold transaction log files as long as possible. This index would grow at the rate of 10 million documents per month indefinitely. Is there a limit where lucene will fail? What should

Re: Speed of complex boolean searches on large indexes

2005-09-09 Thread Otis Gospodnetic
Well, by changing your query, you are changing your criteria, so I assume you also got different (less) results. That's one reason why your query got faster. If index size is the issue, and that Field1 consumes most of it, and you are not using it in search (I don't see it in your sample query), e

Speed of complex boolean searches on large indexes

2005-09-09 Thread mopster
Hi, I am testing the speed of searching Lucene indexes. The index is of the larger size! It has about 500,000 documents, about 60 fields with 1 field (Field1) containing the body of the document. Total index size is currently about 20Gb Testing the search i get this behaviour (Field2:1) AND (F

Re: large indexes

2005-03-09 Thread Doug Cutting
Scott Smith wrote: I have the need to create an index which will potentially have a million+ documents. I know Lucene can accomplish this. However, the other requirement is that I need to be continually updating it during the date (adding 1-30 documents/minute). Have a look at this thread: http:/

large indexes

2005-03-08 Thread Scott Smith
I have the need to create an index which will potentially have a million+ documents. I know Lucene can accomplish this. However, the other requirement is that I need to be continually updating it during the date (adding 1-30 documents/minute). I guess I had thought that I might try to have an ac