Lucene 4 single segment performance improvement tips?

2014-03-05 Thread Arvind Kalyan
Hi folks, We are currently using Lucene 4.5 and we are hitting some bottlenecks and appreciate some input from the community. This particular index (the disk size for which is about 10GB) is guaranteed to not have any updates, so we made it a single segment index by doing a forceMerge(1). The ind

Sorted NumericDocValues

2014-03-05 Thread Yonghui Zhao
Hi, Is there any data type in lucene can support functions like SortedDocValues for any numeric(int, long, float, double) type. SortedDocValues only supports bytes, I want some data type can get numeric value and ord(-1 for doc doesn't have the field) for each doc. NumericDocValues only supports

Re: Sorted NumericDocValues

2014-03-05 Thread Michael McCandless
Just use AtomicReader.getDocsWithField to know whether the doc had that field? Mike McCandless http://blog.mikemccandless.com On Wed, Mar 5, 2014 at 7:00 AM, Yonghui Zhao wrote: > Hi, > > Is there any data type in lucene can support functions like SortedDocValues > for any numeric(int, long, f

Re: Lucene 4 single segment performance improvement tips?

2014-03-05 Thread Michael McCandless
What sorts of queries are you running? It seems like they must be very terms-dict intensive, e.g. primary key lookups or multi-term queries, and maybe not matching too many documents? It's strange you can't get CPU usage up, as you add threads. Maybe simplify the test to remove Jetty? Ie, a sta

Re: Sorted NumericDocValues

2014-03-05 Thread Yonghui Zhao
Yes it works. I can use AtomicReader.getDocsWithField and NumericDocValues to implement my requirement for long by doing a sort. But how to deal with other numeric type(int, float, double)? 2014-03-05 20:19 GMT+08:00 Michael McCandless : > Just use AtomicReader.getDocsWithField to know whethe

Re: Sorted NumericDocValues

2014-03-05 Thread Michael McCandless
Just index ints as longs; the codec under the hood should be efficient about storing the bytes (ie, not use more than 4 bytes per doc). For float/double, use Float.floatToRawIntBits / Double.doubleToRawLongBits. Mike McCandless http://blog.mikemccandless.com On Wed, Mar 5, 2014 at 7:47 AM, Yon

Re: Phrase search with ComplexPhraseQueryParser/SpanQueryParser.

2014-03-05 Thread Ahmet Arslan
Hi Modassar, Can you post your request (with an example if possible) to lucene-5205 jura ticket too? If you don't have an jira account, anyone can create one.  Thanks, Ahmet On Wednesday, March 5, 2014 9:40 AM, Modassar Ather wrote: Hi, Phrases with stop words in them are not getting searc

Re: Lucene 4 single segment performance improvement tips?

2014-03-05 Thread Chris Hostetter
: Our runtime/search use-case is very simple: run filters to select all docs : that match some conditions specified in a filter query (we do not use : Lucene scoring) and return the first 100 docs that match (this is an : over-simplification) "first" as defined how? in order collected by a custom

Re: Lucene 4 single segment performance improvement tips?

2014-03-05 Thread Arvind Kalyan
Thanks Mike. Good idea.. we have a pretty thick stack and I got it down to the jetty+lucene thinking it is barebones enough.. but good call on running it purely on lucene. I'll see if it moves any needle (hopefully it does). On Wed, Mar 5, 2014 at 4:25 AM, Michael McCandless < luc...@mikemccandle

Re: Lucene 4 single segment performance improvement tips?

2014-03-05 Thread Arvind Kalyan
On Wed, Mar 5, 2014 at 8:14 AM, Chris Hostetter wrote: > : Our runtime/search use-case is very simple: run filters to select all > docs > : that match some conditions specified in a filter query (we do not use > : Lucene scoring) and return the first 100 docs that match (this is an > : over-simpli