On Fri, 2011-05-13 at 12:11 +0200, Samarendra Pratap wrote:
> Comparison between - single index Vs 21 indexes
> Total Size - 18 GB
> Queries run - 500
> % improvement - roughly 18%
I was expecting a lot more. Could you test whether this is an IO-issue
by selecting a slow query and performing the e
Hi Tom,
Thanks for pointing me to something important (phrase queries) which I
wasn't thinking of.
We are using synonyms which gets expanded at run time. I'll have to give it
a thought.
We are not using synonyms at indexing time due to lack of flexibility of
changing the list. We are not using
Hi Samar,
Have you looked at top or iostat or other monitoring utilities to see if you
are cpu bound vs I/O bound?
With 225 term queries, it's possible that you are I/O bound.
I suspect you need to think about seek time and caching. For each unique
field:term combination lucene has to look up
I'm sure that you should try building one large index and convert to
NumericField wherever you can. I'm convinced that will be faster -
but as ever, the proof will be in the numbers.
On repeated terms, I believe that lucene will search multiple times.
If so, I'd guess it is just something that ha
ion to the file will not cause more IO as it has to skip
> those bytes and write it at the end of file.
>
> Regards
> Ganesh
>
> - Original Message -
> From: "Burton-West, Tom"
> To:
> Sent: Tuesday, May 10, 2011 9:46 PM
> Subject: RE: Sharding T
Hi Tom,
the more i am getting responses in this thread the more i feel that our
application needs optimization.
350 GB and less than 2 seconds!!! That's much more than my expectation :-)
(in current scenario).
*characteristics of slow queries:*
there are a few reasons for greater search time
in GB's. Small addition or
deletion to the file will not cause more IO as it has to skip those bytes and
write it at the end of file.
Regards
Ganesh
- Original Message -
From: "Burton-West, Tom"
To:
Sent: Tuesday, May 10, 2011 9:46 PM
Subject: RE: Sharding Techni
Hi Samar,
>>Normal queries go fine under 500 ms but when people start searching
>>"anything" some queries take up to > 100 seconds. Don't you think
>>distributing smaller indexes on different machines would reduce the average
>>.search time. (Although I have a feeling that search time for smaller
Hi Mike,
*"I think the usual approach is to create multiple mirrored copies (slaves)
rather than sharding"*
This is where my eyes stuck.
We do have mirrors and in-fact a good number of those. 6 servers are being
used for serving regular queries (2 are for specific queries that do take
time) and e
Down to basics, Lucene searches work by locating terms and resolving
documents from them. For standard term queries, a term is located by a
process akin to binary search. That means that it uses log(n) seeks to
get the term. Let's say you have 10M terms in your corpus. If you stored
that in a si
Thanks
to Johannes - I am looking into katta. Seems promising.
to Toke - Great explanation. That's what I was looking for.
I'll come back and share my experience.
Thank you very much.
On Tue, May 10, 2011 at 1:31 PM, Toke Eskildsen wrote:
> On Mon, 2011-05-09 at 13:56 +0200, Samarendra Prata
On Mon, 2011-05-09 at 13:56 +0200, Samarendra Pratap wrote:
> We have an index directory of 30 GB which is divided into 3 subdirectories
> (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories
> (idx1-1, idx1-2, , idx2-1, , idx3-1, , idx3-21).
So each part is about ½ G
On May 10, 2011, at 9:42 AM, Samarendra Pratap wrote:
> Hi,
> Though we have 30 GB total index, size of the indexes that are used
> in 75%-80% searches is 5 GB. and we have average search time around 700 ms.
> (yes, we have optimized index).
>
> Could someone please throw some light on my origin
Hi,
Though we have 30 GB total index, size of the indexes that are used
in 75%-80% searches is 5 GB. and we have average search time around 700 ms.
(yes, we have optimized index).
Could someone please throw some light on my original doubt!!!
If I want to keep smaller indexes on different servers
We are using similar technique as yours. We keep smaller indexes and use
ParallelMultiSearcher to search across the index. Keeping smaller indexes is
good as index and index optimzation would be faster. There will be small delay
while searching across the indexes.
1. What is your search time?
> ...
> 1. I've not tested my application with single index as initially (a few
> years back) we thought smaller the index size (7 indexes for default 80%
> searches) the faster the search time would be ...
Possibly. Maybe it will be acceptable to make some searches a bit
slower in order to make
Hi Ian,
Thanks for sharing your knowledge and to-the-point answers.
1. I've not tested my application with single index as initially (a few
years back) we thought smaller the index size (7 indexes for default 80%
searches) the faster the search time would be. Anyway i'll give it a try and
share t
30Gb isn't that big by lucene standards. Have you considered or tried
just having one large index? If necessary you could restrict searches
to particular "indexes", or groups thereof, by a field in the combined
index, preferably used as a filter. If the slow searches have to
search across 63 sep
18 matches
Mail list logo