Re: Big problem with big indexes

Erick Erickson Wed, 11 Oct 2006 17:37:37 -0700

Something's extremely not right <G>....

First of all, I'm running a 1.4G index on a single machine and getting very
good results, under 10 seconds even for the most complex queries I'm firing.
This is with 870,000 documents, and includes sorting by criteria other than
relevance. And using span queries. And using wildcards that build their own
filters.


So, something must be very different about how you are using lucene to get
such poor search times.

So, please tell us significantly more about the structure of your index and
post the shortest example you can of your search code that demonstrates the
problem, and maybe some of the wiser heads than mine can help out too.

There should be no need to put the index in RAM, the index is just not big
enough.

So, some of the things I think would help analyze your problems....

1> hardware and op systems you're running on. Including how much memory
you're allowing your JVM to have.
2> network topology. If you're running the searchers locally and just
storing the indexes on remote machines, you're possibly having network
latency problems. Personally, I don't think your problem is properly
addressed by splitting your index. 600MB of index is just not big enough to
need this.
3> This *should* work on a local machine with just a single index. How much
trouble would it be to create it so? Can you try that and see what
difference that makes?
4> how did you build your index? Is it optimized? Can you give us an idea of
how many fields you are storing and some indication of the relative sizes of
each? Mostly, I'm asking whether you have a bunch of small fields and some
other very large ones.
5> Put one of the indexes on your local machine and get a copy of Luke
(google luke lucene) and fire off a few queries via Luke and tell us what
kind of results you get. Actually, this is probably the first thing you
should try. If you get radically different results with Luke than your code,
you can be pretty sure you're doing something out of the ordinary.
6> Timings of *only* the search code. By that I mean the time it takes for
searcher.search to complete. It's vaguely possible that the search is fine,
but something you're doing when processing the results is taking forever. I
have no evidence for this, of course, but it'd be a useful bit of
information.

I don't know if this helps much, but from your description, I think there's
a fundamental, correctable problem because nobody would use the product if
it gave such poor search times. And lots of people use it.

Best
Erick

On 10/11/06, Ariel Isaac Romero Cartaya <[EMAIL PROTECTED]> wrote:


Hi everybody:

     I have a big problem making prallel searches in big indexes.
     I have indexed with lucene over 60 000 articles, I have distributed
the
indexes in 10 computers nodes so each index not exceed the 60 MB of size.
I
makes parallel searches in those indexes but I get the search results
after
40 MINUTES !!! Then I put the indexes in memory to do the parallel
searches
But still I get the search results after 3 minutes !!! that`s to mucho
time
waiting !!!
  How Can I reduce the time of search ???
  Could you help me please ???
  I need help !!!!!

Greetings

Re: Big problem with big indexes

Reply via email to