Re: Sharding Techniques

2011-05-10 Thread Samarendra Pratap
Hi, Though we have 30 GB total index, size of the indexes that are used in 75%-80% searches is 5 GB. and we have average search time around 700 ms. (yes, we have optimized index). Could someone please throw some light on my original doubt!!! If I want to keep smaller indexes on different servers

Re: Sharding Techniques

2011-05-10 Thread Johannes Zillmann
On May 10, 2011, at 9:42 AM, Samarendra Pratap wrote: > Hi, > Though we have 30 GB total index, size of the indexes that are used > in 75%-80% searches is 5 GB. and we have average search time around 700 ms. > (yes, we have optimized index). > > Could someone please throw some light on my origin

Re: Sharding Techniques

2011-05-10 Thread Toke Eskildsen
On Mon, 2011-05-09 at 13:56 +0200, Samarendra Pratap wrote: > We have an index directory of 30 GB which is divided into 3 subdirectories > (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories > (idx1-1, idx1-2, , idx2-1, , idx3-1, , idx3-21). So each part is about ½ G

Re: Sharding Techniques

2011-05-10 Thread Samarendra Pratap
Thanks to Johannes - I am looking into katta. Seems promising. to Toke - Great explanation. That's what I was looking for. I'll come back and share my experience. Thank you very much. On Tue, May 10, 2011 at 1:31 PM, Toke Eskildsen wrote: > On Mon, 2011-05-09 at 13:56 +0200, Samarendra Prata

PDF Highlighting using PDF Highlight File

2011-05-10 Thread Wulf Berschin
Hi all, in our Lucene 3.0.3-based web application when a user clicks on a hit link the targeted PDF should be opened in the browser with highlighted hits. For this purpose using the Acrobat Highlight File (Parameter xml, see http://www.pdfbox.org/userguide/highlighting.html and http://partne

An unexpected network error occurred

2011-05-10 Thread Yogesh Dabhi
Three Instance of My application & lucene index directory shared for all instance Lucene version 3.1 Lock factory:- NativeFSLockFactory Instance1 jdk64 ,64 os Instance2 jdk64 ,64 os Instance3 jdk32 ,32 os When I try to search the data from the index directory from Instance1 I got

Re: An unexpected network error occurred

2011-05-10 Thread Ian Lea
A full stack trace dump is always helpful. Are the three instances on one server with a local index directory, or on different servers accessing a network drive (how?) or what? If the index is locked it would be surprising that you could update it from 2 of the instances. -- Ian. On Tue, May

RE: SpanNearQuery - inOrder parameter

2011-05-10 Thread Gregory Tarr
Anyone able to help me with the problem below? Thanks Greg -Original Message- From: Gregory Tarr [mailto:gregory.t...@detica.com] Sent: 09 May 2011 12:33 To: java-user@lucene.apache.org Subject: RE: SpanNearQuery - inOrder parameter Attachment didn't work - test below: import org.ap

Re: Sharding Techniques

2011-05-10 Thread Mike Sokolov
Down to basics, Lucene searches work by locating terms and resolving documents from them. For standard term queries, a term is located by a process akin to binary search. That means that it uses log(n) seeks to get the term. Let's say you have 10M terms in your corpus. If you stored that in a si

Re: Sharding Techniques

2011-05-10 Thread Samarendra Pratap
Hi Mike, *"I think the usual approach is to create multiple mirrored copies (slaves) rather than sharding"* This is where my eyes stuck. We do have mirrors and in-fact a good number of those. 6 servers are being used for serving regular queries (2 are for specific queries that do take time) and e

RE: Sharding Techniques

2011-05-10 Thread Burton-West, Tom
Hi Samar, >>Normal queries go fine under 500 ms but when people start searching >>"anything" some queries take up to > 100 seconds. Don't you think >>distributing smaller indexes on different machines would reduce the average >>.search time. (Although I have a feeling that search time for smaller

Query on using Payload with MoreLikeThis class

2011-05-10 Thread Saurabh Gokhale
Hi, In the Lucene 2.9.4 project, there is a requirement to boost some of the keywords in the document using payload. Now while searching, is there a way I can boost the MoreLikeThis result using the index time payload values? Or can I merge MoreLikeThis output and PayloadTermQuery output somehow

Re: SpanNearQuery - inOrder parameter

2011-05-10 Thread Tom Hill
Since no one else is jumping in, I'll say that I suspect that the span query code does not bother to check to see if two of the terms are the same. I think that would account for the behavior you are seeing. Since the second SpanTermQuery would match the same term the first one did. Note that I'm

RE: SpanNearQuery - inOrder parameter

2011-05-10 Thread Chris Hostetter
: I attach a junit test which shows strange behaviour of the inOrder : parameter on the SpanNearQuery constructor, using Lucene 2.9.4. : : My understanding of this parameter is that true forces the order and : false doesn't care about the order. : : Using true always works. However using false

Re: How do I sort lucene search results by relevance and time?

2011-05-10 Thread Johnbin Wang
Thanks for your suggestion! I try to set document boost factor when indexing document. In order to bubble up recent documents' scores, I set last three month's documents' boost to 2 , and set other documents' boost factor to 0.5. The I search index sorting by two fields, lucene default score and

Can I omit ShingleFilter's filler tokens

2011-05-10 Thread William Koscho
Hi, Can I remove the filler token _ from the n-gram-tokens that are generated by a ShingleFilter? I'm using a chain of filters: ClassicFilter, StopFilter, LowerCaseFilter, and ShingleFilter to create phrase n-grams. The ShingleFilter inserts FILLER_TOKENs in place of the stopwords, but I don't w

Re: Sharding Techniques

2011-05-10 Thread Ganesh
We also use similar kind of technique, breaking indexes in to smaller and search using ParallelMultiSearcher. We have to do incremental indexing and the records older than 6 months or 1 year (based on ageout setting) should be deleted. Having multiple small indexes is really fast in terms of in