Hi,
Though we have 30 GB total index, size of the indexes that are used
in 75%-80% searches is 5 GB. and we have average search time around 700 ms.
(yes, we have optimized index).
Could someone please throw some light on my original doubt!!!
If I want to keep smaller indexes on different servers
On May 10, 2011, at 9:42 AM, Samarendra Pratap wrote:
> Hi,
> Though we have 30 GB total index, size of the indexes that are used
> in 75%-80% searches is 5 GB. and we have average search time around 700 ms.
> (yes, we have optimized index).
>
> Could someone please throw some light on my origin
On Mon, 2011-05-09 at 13:56 +0200, Samarendra Pratap wrote:
> We have an index directory of 30 GB which is divided into 3 subdirectories
> (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories
> (idx1-1, idx1-2, , idx2-1, , idx3-1, , idx3-21).
So each part is about ½ G
Thanks
to Johannes - I am looking into katta. Seems promising.
to Toke - Great explanation. That's what I was looking for.
I'll come back and share my experience.
Thank you very much.
On Tue, May 10, 2011 at 1:31 PM, Toke Eskildsen wrote:
> On Mon, 2011-05-09 at 13:56 +0200, Samarendra Prata
Hi all,
in our Lucene 3.0.3-based web application when a user clicks on a hit
link the targeted PDF should be opened in the browser with highlighted hits.
For this purpose using the Acrobat Highlight File (Parameter xml, see
http://www.pdfbox.org/userguide/highlighting.html and
http://partne
Three Instance of My application & lucene index directory shared for all
instance
Lucene version 3.1
Lock factory:- NativeFSLockFactory
Instance1 jdk64 ,64 os
Instance2 jdk64 ,64 os
Instance3 jdk32 ,32 os
When I try to search the data from the index directory from Instance1
I got
A full stack trace dump is always helpful. Are the three instances on
one server with a local index directory, or on different servers
accessing a network drive (how?) or what? If the index is locked it
would be surprising that you could update it from 2 of the instances.
--
Ian.
On Tue, May
Anyone able to help me with the problem below?
Thanks
Greg
-Original Message-
From: Gregory Tarr [mailto:gregory.t...@detica.com]
Sent: 09 May 2011 12:33
To: java-user@lucene.apache.org
Subject: RE: SpanNearQuery - inOrder parameter
Attachment didn't work - test below:
import org.ap
Down to basics, Lucene searches work by locating terms and resolving
documents from them. For standard term queries, a term is located by a
process akin to binary search. That means that it uses log(n) seeks to
get the term. Let's say you have 10M terms in your corpus. If you stored
that in a si
Hi Mike,
*"I think the usual approach is to create multiple mirrored copies (slaves)
rather than sharding"*
This is where my eyes stuck.
We do have mirrors and in-fact a good number of those. 6 servers are being
used for serving regular queries (2 are for specific queries that do take
time) and e
Hi Samar,
>>Normal queries go fine under 500 ms but when people start searching
>>"anything" some queries take up to > 100 seconds. Don't you think
>>distributing smaller indexes on different machines would reduce the average
>>.search time. (Although I have a feeling that search time for smaller
Hi,
In the Lucene 2.9.4 project, there is a requirement to boost some of the
keywords in the document using payload.
Now while searching, is there a way I can boost the MoreLikeThis result
using the index time payload values?
Or can I merge MoreLikeThis output and PayloadTermQuery output somehow
Since no one else is jumping in, I'll say that I suspect that the span
query code does not bother to check to see if two of the terms are the
same.
I think that would account for the behavior you are seeing. Since the
second SpanTermQuery would match the same term the first one did.
Note that I'm
: I attach a junit test which shows strange behaviour of the inOrder
: parameter on the SpanNearQuery constructor, using Lucene 2.9.4.
:
: My understanding of this parameter is that true forces the order and
: false doesn't care about the order.
:
: Using true always works. However using false
Thanks for your suggestion!
I try to set document boost factor when indexing document. In order to
bubble up recent documents' scores, I set last three month's documents'
boost to 2 , and set other documents' boost factor to 0.5. The I search
index sorting by two fields, lucene default score and
Hi,
Can I remove the filler token _ from the n-gram-tokens that are generated by
a ShingleFilter?
I'm using a chain of filters: ClassicFilter, StopFilter, LowerCaseFilter,
and ShingleFilter to create phrase n-grams. The ShingleFilter inserts
FILLER_TOKENs in place of the stopwords, but I don't w
We also use similar kind of technique, breaking indexes in to smaller and
search using ParallelMultiSearcher. We have to do incremental indexing and the
records older than 6 months or 1 year (based on ageout setting) should be
deleted. Having multiple small indexes is really fast in terms of in
17 matches
Mail list logo