RE: Increase search performance

2018-02-02 Thread Atul Bisaria
Thanks for the feedback! -Original Message- From: Adrien Grand [mailto:jpou...@gmail.com] Sent: Friday, February 02, 2018 1:42 PM To: java-user@lucene.apache.org Subject: Re: Increase search performance If needsScores returns false on the collector, then scores won't be computed.

Re: Increase search performance

2018-02-02 Thread Adrien Grand
.docBase = context.docBase; > } > > public ScoreDoc[] getHits() > { > return matches; > } > } > > Best Regards, > Atul Bisaria > > -Original Message- > From: Adrien Grand [mailto:jpou...@gmail.com] > Se

RE: Increase search performance

2018-02-01 Thread Atul Bisaria
iginal Message- From: Adrien Grand [mailto:jpou...@gmail.com] Sent: Thursday, February 01, 2018 6:11 PM To: java-user@lucene.apache.org Subject: Re: Increase search performance Yes, this collector won't perform well if you have many matches since memory usage is linear with the number of

Re: Increase search performance

2018-02-01 Thread Adrien Grand
ffle(matches); > maxHitsRequired = Math.min(matches.size(), > maxHitsRequired); > > return matches.subList(0, maxHitsRequired); > } > } > > Best Regards, > Atul Bisaria > > -Original Message- > From: Adrien Grand [ma

RE: Increase search performance

2018-02-01 Thread Atul Bisaria
); } } Best Regards, Atul Bisaria -Original Message- From: Adrien Grand [mailto:jpou...@gmail.com] Sent: Wednesday, January 31, 2018 6:33 PM To: java-user@lucene.apache.org Subject: Re: Increase search performance Hi Atul, Le mar. 30 janv. 2018 à 16:24, Atul Bisaria a écrit : >

Re: Increase search performance

2018-01-31 Thread Adrien Grand
//issues.apache.org/jira/browse/LUCENE-6784), but I am not > able to see any significant change in search performance. > > Here is the code I am testing with: > > > > DirectoryReader reader = DirectoryReader.open(directory); //using > MMapDirectory > > IndexS

Increase search performance

2018-01-30 Thread Atul Bisaria
In the search use case in my application, I don't need to score query results since all results are equal. Also query patterns are also more or less fixed. Given these conditions, I am trying to increase search performance by 1. Using ConstantScoreQuery so that scoring overhe

Lucene Grouping Search - Performance

2017-05-11 Thread aravinth thangasami
Hi all, On experimenting with Lucene Group Search in Lucene 4.10, Once Field Cache is formed, We recorded better performance with Field cache compared to doc values. So I decided to avoid doc values on that field. Our Index involves 80% of updates. How much will this affect field cache? Is it

Re: Search Performance with NRT

2015-05-27 Thread kiwi clive
Hi Mike, Thanks for the very prompt and clear response. We look forward to using the new (new for us) Lucenene goodies :-) Clive From: Michael McCandless To: Lucene Users ; kiwi clive Sent: Thursday, May 28, 2015 2:34 AM Subject: Re: Search Performance with NRT As long as you

Re: Search Performance with NRT

2015-05-27 Thread Michael McCandless
As long as you call SM.maybeRefresh from a dedicated refresh thread (not from a query's thread) it will work well. You may want to use a warmer so that the new searcher is warmed before becoming visible to incoming queries ... this ensures any lazy data structures are initialized by the time a que

Search Performance with NRT

2015-05-27 Thread kiwi clive
Hi Guys We are considering changing our Lucene indexer / search architecture from 2 separate JVMs to a single one to benefit from the very latest index views NRT readers provide. In the past we cached our IndexSearchers to avoid cold searches every time and reopened them periodically.  In the

Re: search performance

2014-06-20 Thread Vitaly Funstein
or may not suffer, as a tradeoff. On Fri, Jun 20, 2014 at 1:19 AM, Uwe Schindler wrote: > Hi, > > > Am I correct that using SearchManager can't be used with a MultiReader > and > > NRT? I would appreciate all suggestions on how to optimize our search > > perf

RE: search performance

2014-06-20 Thread Uwe Schindler
Hi, > Am I correct that using SearchManager can't be used with a MultiReader and > NRT? I would appreciate all suggestions on how to optimize our search > performance further. Search time has become a usability issue. Just have a SearcherManger for every index. MultiReader constru

Re: search performance

2014-06-20 Thread Jamie
r.openIfChanged, avoiding wrapping lucene scoreDoc results, option to disable sorting, etc.). While, in some environments, search performance has improved significantly, in other larger ones we are unfortunately, still seeing 1 minute - 5 minute search times. For instance, in one site, the total inde

Re: search performance

2014-06-20 Thread Jamie
After, DirectoryReader.openIfChanged, avoiding wrapping lucene scoreDoc results, option to disable sorting, etc.). While, in some environments, search performance has improved significantly, in other larger ones we are unfortunately, still seeing 1 minute - 5 minute search times. For instanc

Re: search performance

2014-06-06 Thread Jamie
Jon I ended up adapting your approach. The solution involves keeping a LRU cache of page boundary scoredocs and their respective positions. New positions are added to the cache as new pages are discovered. To cut down on searches, when scrolling backwards and forwards, the search begins from

RE: search performance

2014-06-03 Thread Toke Eskildsen
Jamie [ja...@mailarchiva.com] wrote: > It would be nice if, in future, the Lucene API could provide a > searchAfter that takes a position (int). It would not really help with large result sets. At least not with the current underlying implementations. This is tied into your current performance pr

Re: search performance

2014-06-03 Thread Jamie
Thanks Jon I'll investigate your idea further. It would be nice if, in future, the Lucene API could provide a searchAfter that takes a position (int). Regards Jamie On 2014/06/03, 3:24 PM, Jon Stewart wrote: With regards to pagination, is there a way for you to cache the IndexSearcher, Que

Re: search performance

2014-06-03 Thread Jon Stewart
With regards to pagination, is there a way for you to cache the IndexSearcher, Query, and TopDocs between user pagination requests (a lot of webapp frameworks have object caching mechanisms)? If so, you may have luck with code like this: void ensureTopDocs(final int rank) throws IOException {

Re: search performance

2014-06-03 Thread Jamie
Robert. Thanks, I've already done a similar thing. Results on my test platform are encouraging.. On 2014/06/03, 2:41 PM, Robert Muir wrote: Reopening for every search is not a good idea. this will have an extremely high cost (not as high as what you are doing with "paging" but still not good).

Re: search performance

2014-06-03 Thread Robert Muir
Reopening for every search is not a good idea. this will have an extremely high cost (not as high as what you are doing with "paging" but still not good). Instead consider making it near-realtime, by doing this every second or so instead. Look at SearcherManager for code that helps you do this. O

Re: search performance

2014-06-03 Thread Jamie
Robert FYI: I've modified the code to utilize the experimental function.. DirectoryReader dirReader = DirectoryReader.openIfChanged(cachedDirectoryReader,writer, true); In this case, the IndexReader won't be opened on each search, unless absolutely necessary. Regards Jamie On 2014/06

Re: search performance

2014-06-03 Thread Jamie
Robert Hmmm. why did Mike go to all the trouble of implementing NRT search, if we are not supposed to be using it? The user simply wants the latest result set. To me, this doesn't appear out of scope for the Lucene project. Jamie On 2014/06/03, 1:17 PM, Robert Muir wrote: No, you are

Re: search performance

2014-06-03 Thread Robert Muir
No, you are incorrect. The point of a search engine is to return top-N most relevant. If you insist you need to open an indexreader on every single search, and then return huge amounts of docs, maybe you should use a database instead. On Tue, Jun 3, 2014 at 6:42 AM, Jamie wrote: > Vitality / Rob

Re: search performance

2014-06-03 Thread Jamie
Vitality / Robert I wouldn't go so far as to call our pagination naive!? Sub-optimal, yes. Unless I am mistaken, the Lucene library's pagination mechanism, makes the assumption that you will cache the scoredocs for the entire result set. This is not practical when you have a result set that e

Re: search performance

2014-06-03 Thread Vitaly Funstein
Jamie, What if you were to forget for a moment the whole pagination idea, and always capped your search at 1000 results for testing purposes only? This is just to try and pinpoint the bottleneck here; if, regardless of the query parameters, the search latency stays roughly the same and well below

Re: search performance

2014-06-03 Thread Robert Muir
rchingSpeed) , in some of our > installations, search performance has reached the point where is it > unacceptably slow. For instance, in one environment, the total index size is > 200GB, with 150 million documents indexed. With NRT enabled, search speed is > roughly 5 minutes on average.

Re: search performance

2014-06-03 Thread Jamie
Vitaly See below: On 2014/06/03, 12:09 PM, Vitaly Funstein wrote: A couple of questions. 1. What are you trying to achieve by setting the current thread's priority to max possible value? Is it grabbing as much CPU time as possible? In my experience, mucking with thread priorities like this is

Re: search performance

2014-06-03 Thread Vitaly Funstein
A couple of questions. 1. What are you trying to achieve by setting the current thread's priority to max possible value? Is it grabbing as much CPU time as possible? In my experience, mucking with thread priorities like this is at best futile, and at worst quite detrimental to responsiveness and o

Re: search performance

2014-06-03 Thread Jamie
FYI: We are also using a multireader to search over multiple index readers. Search under a million documents yields good response times. When you get into the 60M territory, search slows to a crawl. On 2014/06/03, 11:47 AM, Jamie wrote: Sure... see below: --

Re: search performance

2014-06-03 Thread Jamie
Sure... see below: protected void search(Query query, Filter queryFilter, Sort sort) throws BlobSearchException { try { logger.debug("start search {searchquery='" + getSearchQuery() + "',query='"+query.toString()+"',filterQuery='"+queryFilter+"',sort='"+sort

Re: search performance

2014-06-03 Thread Rob Audenaerde
Hi Jamie, What is included in the 5 minutes? Just the call to the searcher? seacher.search(...) ? Can you show a bit more of the code you use? On Tue, Jun 3, 2014 at 11:32 AM, Jamie wrote: > Vitaly > > Thanks for the contribution. Unfortunately, we cannot use Lucene's > pagination function

Re: search performance

2014-06-03 Thread Jamie
Vitaly Thanks for the contribution. Unfortunately, we cannot use Lucene's pagination function, because in reality the user can skip pages to start the search at any point, not just from the end of the previous search. Even the first search (without any pagination), with a max of 1000 hits, tak

Re: search performance

2014-06-03 Thread Vitaly Funstein
Something doesn't quite add up. TopFieldCollector fieldCollector = TopFieldCollector.create(sort, max,true, > false, false, true); > > We use pagination, so only returning 1000 documents or so at a time. > > You say you are using pagination, yet the API you are using to create your collector isn't

Re: search performance

2014-06-03 Thread Jamie
Toke Thanks for the contact. See below: On 2014/06/03, 9:17 AM, Toke Eskildsen wrote: On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote: Unfortunately, in this instance, it is a live production system, so we cannot conduct experiments. The number is definitely accurate. We have many different sy

Re: search performance

2014-06-03 Thread Toke Eskildsen
On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote: > Unfortunately, in this instance, it is a live production system, so we > cannot conduct experiments. The number is definitely accurate. > > We have many different systems with a similar load that observe the same > performance issue. To my knowle

Re: search performance

2014-06-02 Thread Christoph Kaser
Can you take thread stacktraces (repeatedly) during those 5 minute searches? That might give you (or someone on the mailing list) a clue where all that time is spent. You could try using jstack for that: http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html Regards Christoph

Re: search performance

2014-06-02 Thread Jamie
Toke Thanks for the comment. Unfortunately, in this instance, it is a live production system, so we cannot conduct experiments. The number is definitely accurate. We have many different systems with a similar load that observe the same performance issue. To my knowledge, the Lucene integrati

Re: search performance

2014-06-02 Thread Toke Eskildsen
On Mon, 2014-06-02 at 08:51 +0200, Jamie wrote: [200GB, 150M documents] > With NRT enabled, search speed is roughly 5 minutes on average. > The server resources are: > 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux. 5 minutes is extremely long. Is that really the right number

Re: search performance

2014-06-02 Thread Tri Cao
aling with here. Hope this helps, Tri On Jun 01, 2014, at 11:50 PM, Jamie wrote: Greetings Despite following all the recommended optimizations (as described at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of our installations, search performance has reached the po

Re: search performance

2014-06-02 Thread Jamie
I assume you meant 1000 documents. Yes, the page size is in fact configurable. However, it only obtains the page size * 3. It preloads the following and previous page too. The point is, it only obtains the documents that are needed. On 2014/06/02, 3:03 PM, Tincu Gabriel wrote: My bad, It's u

Re: search performance

2014-06-02 Thread Tincu Gabriel
My bad, It's using the RamDirectory as a cache and a delegate directory that you pass in the constructor to do the disk operations, limiting the use of the RamDirectory to files that fit a certain size. So i guess the underlying Directory implementation will be whatever you choose it to be. I'd sti

Re: search performance

2014-06-02 Thread Jamie
I was under the impression that NRTCachingDirectory will instantiate an MMapDirectory if a 64 bit platform is detected? Is this not the case? On 2014/06/02, 2:09 PM, Tincu Gabriel wrote: MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that

Re: search performance

2014-06-02 Thread Tincu Gabriel
MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that the performance will get killed by an index larger than a few hundred MB, and NRTCachingDirectory is a wrapper for RamDirectory and suitable for low update rates. MMap will use the system RAM

Re: search performance

2014-06-02 Thread Jamie
Jack First off, thanks for applying your mind to our performance problem. On 2014/06/02, 1:34 PM, Jack Krupansky wrote: Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are t

Re: search performance

2014-06-02 Thread Jack Krupansky
256GB machine? How frequent are your commits for updates while doing queries? -- Jack Krupansky -Original Message- From: Jamie Sent: Monday, June 2, 2014 2:51 AM To: java-user@lucene.apache.org Subject: search performance Greetings Despite following all the recommended optimizations (as

Re: search performance

2014-06-02 Thread Jamie
Tom Thanks for the offer of assistance. On 2014/06/02, 12:02 PM, Tincu Gabriel wrote: What kind of queries are you pushing into the index. We are indexing regular emails + attachments. Typical query is something like: filter: to:mbox08 from:mbox08 cc:mbox08 bcc:mbox08 deliver

Re: search performance

2014-06-02 Thread Tincu Gabriel
: > Greetings > > Despite following all the recommended optimizations (as described at > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of > our installations, search performance has reached the point where is it > unacceptably slow. For instance, in one environmen

search performance

2014-06-01 Thread Jamie
Greetings Despite following all the recommended optimizations (as described at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of our installations, search performance has reached the point where is it unacceptably slow. For instance, in one environment, the total index

Re: Improving search performance for forum search

2012-11-24 Thread Arjen van der Meijden
t: Tuesday, November 13, 2012 8:36 AM To: java-user@lucene.apache.org Subject: Improving search performance for forum search Hi List, I'm working on a search engine for our forum using Lucene 4. Since its a brand new search engine, I can change it as I see fit. We have about 1.5M topics

Re: Improving search performance for forum search

2012-11-13 Thread Arjen van der Meijden
remen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Arjen van der Meijden [mailto:acmmail...@tweakers.net] Sent: Tuesday, November 13, 2012 8:36 AM To: java-user@lucene.apache.org Subject: Improving search performance for forum search Hi List, I'm working on a search

RE: Improving search performance for forum search

2012-11-13 Thread Uwe Schindler
://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Arjen van der Meijden [mailto:acmmail...@tweakers.net] > Sent: Tuesday, November 13, 2012 8:36 AM > To: java-user@lucene.apache.org > Subject: Improving search performance for forum search > > Hi L

Improving search performance for forum search

2012-11-12 Thread Arjen van der Meijden
Hi List, I'm working on a search engine for our forum using Lucene 4. Since its a brand new search engine, I can change it as I see fit. We have about 1.5M topics in the various subforums and on average 20 replies to each topic (i.e. about 33M in total). For now, I've opted to index all repli

Re: lucene (search) performance tuning

2012-05-28 Thread Lance Norskog
And, no RamDirectory does not help. On Mon, May 28, 2012 at 5:54 PM, Lance Norskog wrote: > Can you use filter queries? Filters short-circuit a lot of search > processing. "City:San Francisco" is a classic filter - it is a small > part of the documents and it is reused a lot. > > On Sat, May 26,

Re: lucene (search) performance tuning

2012-05-28 Thread Lance Norskog
Can you use filter queries? Filters short-circuit a lot of search processing. "City:San Francisco" is a classic filter - it is a small part of the documents and it is reused a lot. On Sat, May 26, 2012 at 7:32 AM, Yang wrote: > I'm using disjunction (OR) query. unfortunately all of the clauses ar

Re: lucene (search) performance tuning

2012-05-26 Thread Yang
I'm using disjunction (OR) query. unfortunately all of the clauses are optional On Sat, May 26, 2012 at 4:38 AM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > On Sat, May 26, 2012 at 2:59 AM, Yang wrote: > > I tested with more threads / processes. indeed this is completely > > cpu-b

Re: lucene (search) performance tuning

2012-05-26 Thread Li Li
if you don't score but sort by id, it may be a little bit faster. but for 3.x, you can hardly speed up by simpler scoring function. for your situation, the bottleneck is cpu. you can speed up by paralleling. so the best one is to split index and searching concurrently. so the cpus can be fully used

Re: lucene (search) performance tuning

2012-05-26 Thread Simon Willnauer
On Sat, May 26, 2012 at 2:59 AM, Yang wrote: > I tested with more threads / processes. indeed this is completely > cpu-bound, since running 1 thread gives the same latency as 4 threads (my > box has 4 cores) > > > given this, is there any way to simplify the scoring computation (i'm only > using l

Re: lucene (search) performance tuning

2012-05-25 Thread Yang
I tested with more threads / processes. indeed this is completely cpu-bound, since running 1 thread gives the same latency as 4 threads (my box has 4 cores) given this, is there any way to simplify the scoring computation (i'm only using lucene as a first level "rough" search, so the search quali

Re: lucene (search) performance tuning

2012-05-25 Thread Yang
thanks a lot guys On Tue, May 22, 2012 at 1:34 AM, Ian Lea wrote: > Lots of good tips in > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked from > the FAQ. > > > -- > Ian. > > > On Tue, May 22, 2012 at 2:08 AM, Li Li wrote: > > something wrong when writing in my android client.

Re: lucene (search) performance tuning

2012-05-22 Thread Ian Lea
Lots of good tips in http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked from the FAQ. -- Ian. On Tue, May 22, 2012 at 2:08 AM, Li Li wrote: > something wrong when writing in my android client. > if RAMDirectory do not help, i think the bottleneck is cpu. you may try to > tune jvm

Re: lucene (search) performance tuning

2012-05-21 Thread Li Li
something wrong when writing in my android client. if RAMDirectory do not help, i think the bottleneck is cpu. you may try to tune jvm but i do not expect much improvement. the best one is splitting your index into 2 or more smaller ones. you can then use solr s distributed searching. if the cpu is

Re: lucene (search) performance tuning

2012-05-21 Thread Li Li
在 2012-5-22 凌晨4:59,"Yang" 写道: > > I'm trying to make my search faster. right now a query like > > name:Joe Moe Pizza address:77 main street city:San Francisco >is this a conjunction query or a disjunction query? > in a index with 20mil such short business descriptions (total size about 3GB) take

Re: Improving Lucene Search Performance

2011-12-09 Thread Chris Hostetter
: Subject: Improving Lucene Search Performance : In-Reply-To: : : References: : <161fd7d0-e01f-42f2-a02a-a4e4b182c...@ebi.ac.uk><347A161B-6C7B-4DC3-ACD0-9A804E2 : dd...@ebi.ac.uk><007613f0-8529-47a3-95c4-7839e1d3e...@ebi.ac.uk> : https://people.apache.org/~hos

Re: Improving Lucene Search Performance

2011-12-08 Thread Ian Lea
See http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. Some of the tips relate to indexing but most to search time stuff. -- Ian. On Thu, Dec 8, 2011 at 10:45 AM, Dilshad K. P. wrote: > Hi, > Is there any thing to take care while creating index for improving lucene > text search speed

Improving Lucene Search Performance

2011-12-08 Thread Dilshad K. P.
Hi, Is there any thing to take care while creating index for improving lucene text search speed. Thanks And Regards Dilshad K.P * Confidentiality Statement/Disclaimer * This message and any attachments is intended for the sole use of the intended recipient. It may contain confidential i

Re: Regarding Search Performance

2011-02-03 Thread Simon Willnauer
You should really provide us with more info. Maybe some code too. Valuable infos are for example: - how big is your index? - how does the query look like? - are you searching from a local file system or ram dir or from remote FS? - how fast is the second search? - which version of lucene are you u

Regarding Search Performance

2011-02-03 Thread madhuri_1820
Hi, I have searching fields from multiple indexes. I am using Boolean Query. Index Search is taking nearly 20 sec for one query. I have read that Query Filter have a feature of caching the inner Query search results. I am not sure which Query is useful whether Query Filter or boolean query ?

Re: term vector - WITH_POSITIONS_OFFSETS vs YES in terms of search performance

2010-11-30 Thread Michael McCandless
rs set to > WITH_POSITIONS_OFFSETS vs YES in terms of search performance?  I did some > testing and the results were inconclusive.  In one case, > WITH_POSITIONS_OFFSETS was searched faster than YES, in all others, it was > the reverse.  Is the performance effect only at indexing time? > > Tha

term vector - WITH_POSITIONS_OFFSETS vs YES in terms of search performance

2010-11-30 Thread Maricris Villareal
Hi, Could someone tell me the effect (if any) of having term vectors set to WITH_POSITIONS_OFFSETS vs YES in terms of search performance? I did some testing and the results were inconclusive. In one case, WITH_POSITIONS_OFFSETS was searched faster than YES, in all others, it was the reverse

Re: Bettering search performance

2010-08-27 Thread Erick Erickson
on Systems, Infosys > Email: shelly_si...@infosys.com > Phone: (M) 91 992 369 7200, (VoIP)2022978622 > > > -Original Message- > From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] > Sent: Friday, August 27, 2010 2:27 PM > To: java-user@lucene.apache.org > Sub

Re: Using multiple drives and non-CFS format to improve search performance

2010-08-27 Thread Stefan Nikolic
. I assume it's going to change with > configurations, index sizes and use cases: not an easy task. > > Sanne > > 2010/8/26 Stefan Nikolic : > > Hi everyone, > > > > I'm trying to figure out the effects on search performance of using the > > n

RE: Bettering search performance

2010-08-27 Thread Shelly_Singh
Systems, Infosys Email: shelly_si...@infosys.com Phone: (M) 91 992 369 7200, (VoIP)2022978622 -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Friday, August 27, 2010 2:27 PM To: java-user@lucene.apache.org Subject: Re: Bettering search performance On Fri, 2010

Re: Bettering search performance

2010-08-27 Thread Toke Eskildsen
On Fri, 2010-08-27 at 05:34 +0200, Shelly_Singh wrote: > I have a lucene index of 100 million documents. [...] total index size is > 7GB. [...] > I get a response time of over 2 seconds. How many documents match such a query and how many of those documents do you process (i.e. extract a term f

Re: Using multiple drives and non-CFS format to improve search performance

2010-08-26 Thread Sanne Grinovero
index sizes and use cases: not an easy task. Sanne 2010/8/26 Stefan Nikolic : > Hi everyone, > > I'm trying to figure out the effects on search performance of using the > non-CFS format and spreading the various underlying files to different > disks/media types. For example,

Bettering search performance

2010-08-26 Thread Shelly_Singh
Hi, I have a lucene index of 100 million documents. But the document size is very small - 5 fields with 1 or 2 terms each. Only 1 field is analyzed and others are just simply indexed. The index is optimized to 2 segments and the total index size is 7GB. I open a searcher with a termsInfoDiviso

Using multiple drives and non-CFS format to improve search performance

2010-08-26 Thread Stefan Nikolic
Hi everyone, I'm trying to figure out the effects on search performance of using the non-CFS format and spreading the various underlying files to different disks/media types. For example, I'm considering moving a segment's various .t* term-related files onto a solid-state drive

is there any resource for improve lucene index/search performance

2010-07-20 Thread Li Li
Or where to find any improvement proposal for lucene? e.g. I want to change the float point multiplication to integer multiplication or using bitmap for high frequent terms or something else like this. Is there any place where I can find any resources or guys? thanks.

Re: [Fwd: Re: Lucene 3.0 Search Performance Stats]

2010-03-22 Thread Jamie
Hi Suman Here are some of the things we did: - cache searcher/s - cache indexreader/s - all users use the same searchers - perform a background search when apps starts to warm up search engine - use numerics where necessary - use shorter dates (i.e. do you really need a granularity of up to the

[Fwd: Re: Lucene 3.0 Search Performance Stats]

2010-03-22 Thread suman . holani
t; consumption >>> has dropped considerably! >>> >>> Some stats: >>> >>> Indexed Docs: 7.2M emails >>> Index Size: 24 GB (non optimized) >>> Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date) >>> >>>

Re: Lucene 3.0 Search Performance Stats

2010-03-22 Thread Michael McCandless
Looks like the bulk of your RAM usage is from the 370K index terms in your terms dict... The flex branch (once it lands) should substantially reduce that... Mike On Mon, Mar 22, 2010 at 8:35 AM, Jamie wrote: > Hi Everyone > > The stats I sent through earlier were erroneous due to fact the date

Re: Lucene 3.0 Search Performance Stats

2010-03-22 Thread Jamie
Hi Everyone The stats I sent through earlier were erroneous due to fact the date range query selected fewer records than stated. The correct stats are: Lucene 3.0 Stats: Search conducted using Lucene's Realtime search feature (writer.getReader() for each search) Analyzer: Russian Analyzer

RE: Lucene 3.0 Search Performance Stats

2010-03-20 Thread Uwe Schindler
ly! > >> > >> Some stats: > >> > >> Indexed Docs: 7.2M emails > >> Index Size: 24 GB (non optimized) > >> Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date) > >> > >> Index stored on 4 SAS HDD hitachi RAID 10

Re: Lucene 3.0 Search Performance Stats

2010-03-20 Thread Jamie
derably! Some stats: Indexed Docs: 7.2M emails Index Size: 24 GB (non optimized) Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date) Index stored on 4 SAS HDD hitachi RAID 10 16G RAM 2x Xeon 4 core 2.4Gz OS FreeBSD 7.2 Filesystem UFS2 gjournal I believe we are using all search performance

Re: Lucene 3.0 Search Performance Stats

2010-03-19 Thread Monique Monteiro
FreeBSD 7.2 > Filesystem UFS2 gjournal > > I believe we are using all search performance recommendations now. > > Good job! > > Jamie > > > > - > To unsubscribe, e-mail: java-user-un

Re: Lucene 3.0 Search Performance Stats

2010-03-19 Thread Michael McCandless
ped considerably! >> >> Some stats: >> >> Indexed Docs: 7.2M emails >> Index Size: 24 GB (non optimized) >> Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date) >> >> Index stored on 4 SAS HDD hitachi RAID 10 >> 16G RAM >>

Re: Lucene 3.0 Search Performance Stats

2010-03-19 Thread Jamie
SAS HDD hitachi RAID 10 16G RAM 2x Xeon 4 core 2.4Gz OS FreeBSD 7.2 Filesystem UFS2 gjournal I believe we are using all search performance recommendations now. Good job! Jamie - To unsubscribe, e-mail: java-user-unsubscr

Lucene 3.0 Search Performance Stats

2010-03-19 Thread Jamie
Index Size: 24 GB (non optimized) Search Speed: 0.06 - 0.09 seconds (with sort YYMMHHSS date) Index stored on 4 SAS HDD hitachi RAID 10 16G RAM 2x Xeon 4 core 2.4Gz OS FreeBSD 7.2 Filesystem UFS2 gjournal I believe we are using all search performance recommendations now. Good job! Jamie

Re: faceted search performance

2009-10-27 Thread Toke Eskildsen
On Mon, 2009-10-12 at 20:02 +0200, Jake Mannix wrote: > This killer is the "TermQuery for each term" part - this is huge. You need > to invert this process, and use your query as is, but while walking in the > HitCollector, on each doc which matches your query, increment counters for > each of the

Re: help needed improving lucene concurret search performance

2009-10-24 Thread Wilson Wu
-- Forwarded message -- From: Wilson Wu Date: 2009/10/24 Subject: Re: help needed improving lucene concurret search performance To: java-user@lucene.apache.org Hi,      Thanks a lot for your reply. There are 4 processors in my system.      I am not sure that 100 threads is going

Re: help needed improving lucene concurret search performance

2009-10-24 Thread Wilson Wu
Hi, Thanks a lot for your reply. There are 4 processors in my system. I am not sure that 100 threads is going to be 10 times slower than 10 threads .Because all the threads don't run serial but parallel. I think when there are 100 customers accessing my system,100 http connections will

Re: help needed improving lucene concurret search performance

2009-10-23 Thread Yonik Seeley
How many processors do you have on this system? If you are CPU bound, 100 threads is going to be 10 times slower (at a minimum) than 10 threads (unless you have more than 10 CPUs). -Yonik http://www.lucidimagination.com On Fri, Oct 23, 2009 at 2:18 AM, Wilson Wu wrote: > Dear Friend, >     I hav

help needed improving lucene concurret search performance

2009-10-22 Thread Wilson Wu
Dear Friend, I have encountered some performance problems recently in lucene search 2.9. I use a single IndexSearcher in the whole system, It seems perfect when there is less than 10 threads doing search concurrenty. Bu if there is more than 100 threads doing concurrent search,the average resp

Re: faceted search performance

2009-10-13 Thread Christoph Boosz
Ok, I will have a shot at the ascending docId order. Chris 2009/10/13 Paul Elschot > On Monday 12 October 2009 23:29:07 Christoph Boosz wrote: > > Hi Paul, > > > > Thanks for your suggestion. I will test it within the next few days. > > However, due to memory limitations, it will only work if t

Re: faceted search performance

2009-10-13 Thread Paul Elschot
On Monday 12 October 2009 23:29:07 Christoph Boosz wrote: > Hi Paul, > > Thanks for your suggestion. I will test it within the next few days. > However, due to memory limitations, it will only work if the number of hits > is small enough, am I right? One can load a single term vector at a time, s

Re: faceted search performance

2009-10-12 Thread Christoph Boosz
Hi Paul, Thanks for your suggestion. I will test it within the next few days. However, due to memory limitations, it will only work if the number of hits is small enough, am I right? Chris 2009/10/12 Paul Elschot > Chris, > > You could also store term vectors for all docs at indexing > time, a

Re: faceted search performance

2009-10-12 Thread Paul Elschot
Chris, You could also store term vectors for all docs at indexing time, and add the termvectors for the matching docs into a (large) map of terms in RAM. Regards, Paul Elschot On Monday 12 October 2009 21:30:48 Christoph Boosz wrote: > Hi Jake, > > Thanks for your helpful explanation. > In fac

Re: faceted search performance

2009-10-12 Thread Christoph Boosz
Hi Jake, Thanks for your helpful explanation. In fact, my initial solution was to traverse each document in the result once and count the contained terms. As you mentioned, this process took a lot of memory. Trying to confine the memory usage with the facet approach, I was surprised by the decline

Re: faceted search performance

2009-10-12 Thread Jake Mannix
Hey Chris, On Mon, Oct 12, 2009 at 10:30 AM, Christoph Boosz < christoph.bo...@googlemail.com> wrote: > Thanks for your reply. > Yes, it's likely that many terms occur in few documents. > > If I understand you right, I should do the following: > -Write a HitCollector that simply increments a coun

Re: faceted search performance

2009-10-12 Thread Christoph Boosz
Thanks for your reply. Yes, it's likely that many terms occur in few documents. If I understand you right, I should do the following: -Write a HitCollector that simply increments a counter -Get the filter for the user query once: new CachingWrapperFilter(new QueryWrapperFilter(userQuery)); -Create

  1   2   3   >