Lucene Performance Tuning

2018-07-18 Thread Hicks, Matt
I am seeing serious performance differences with three slightly varied queries: https://gist.github.com/darkfrog26/de19959db854aaf30957d64d1730d07f Can anyone explain why this might be happening and any tips to optimize it? Most queries are lightning fast, but ones like "Smith Mark D" are taking

Re: Lucene performance benchmark | search throughput

2017-01-17 Thread Michael McCandless
ents to the other clauses, allowing those other clauses >> >>> to iterate much faster than they would otherwise require if they were >> >>> not AND'd. >> >>> >> >>> Mike McCandless >> >>> >> >>> http://blog.mi

Re: Lucene performance benchmark | search throughput

2017-01-17 Thread Rajnish kamboj
t; >>> > As per my understanding Lucene will first evaluate all conditions > >>> > separately and then merge the Documents as per AND/OR clauses. > >>> > At last it will return me 10 records. > >>> > > >>> > So, if I add one more condi

Re: Lucene performance benchmark | search throughput

2017-01-06 Thread Michael McCandless
dd to search time and >>> > merge >>> > time and hence increase latency, which results in decreased throughput. >>> > >>> > >>> > Also, what is the search performance benchmark against Lucene version? >>> > >>> > >>> &

Re: Lucene performance benchmark | search throughput

2017-01-05 Thread Rajnish kamboj
t; merge >> > time and hence increase latency, which results in decreased throughput. >> > >> > >> > Also, what is the search performance benchmark against Lucene version? >> > >> > >> > Regards >> > Rajnish >> > >> &g

Re: Lucene performance benchmark | search throughput

2017-01-03 Thread Michael McCandless
guess: more conditions = less documents to score and sort to return. >> >> On Mon, Jan 2, 2017 at 7:23 PM, Rajnish kamboj >> wrote: >> >> > Hi >> > >> > Is there any Lucene performance benchmark against certain set of data? >> > [i.e Is

Re: Lucene performance benchmark | search throughput

2017-01-03 Thread Rajnish kamboj
My guess: more conditions = less documents to score and sort to return. > > On Mon, Jan 2, 2017 at 7:23 PM, Rajnish kamboj > wrote: > > > Hi > > > > Is there any Lucene performance benchmark against certain set of data? > > [i.e Is there any stats for search throughp

Re: Lucene performance benchmark | search throughput

2017-01-03 Thread Michael Wilkowski
My guess: more conditions = less documents to score and sort to return. On Mon, Jan 2, 2017 at 7:23 PM, Rajnish kamboj wrote: > Hi > > Is there any Lucene performance benchmark against certain set of data? > [i.e Is there any stats for search throughput which Lucene can provide for

Lucene performance benchmark | search throughput

2017-01-02 Thread Rajnish kamboj
Hi Is there any Lucene performance benchmark against certain set of data? [i.e Is there any stats for search throughput which Lucene can provide for a certain data?] Search throughput Example: Max. 200 TPS for 50K data on Lucene 5.3.1 on RHEL version x (with SSD) Max. 150 TPS for 100K data on

Log indexing with lucene performance issues

2016-04-21 Thread Hamed Ghavamnia
Hello, We've created a log management system using lucene 4.3. Each log has about 10 fields and all of them are stored. We store each hour of the logs in a separate folder so when someone runs a query only the folders specified in the time frame are searched. The indexes are loaded using the mmap

Re: Lucene performance

2014-01-27 Thread Hamed Ghavamnia
Thanks, I've put some time checks on the different parts of my search, it seems like the directory opening part is taking most of the response time. I'm using MMapDirectory, but it doesn't seem to speed up my directory opening process. I've split my indexes during creation into different folders, a

Re: Lucene performance

2014-01-25 Thread Erick Erickson
You'll have to do some tuning with that kind of ingestion rate, and you're talking about a significant size cluster here. At 172M documents/day or so, you're not going to store very many days per node. Storing doesn't make much of any difference as far as search speed is concerned, the raw data is

Lucene performance

2014-01-24 Thread Hamed Ghavamnia
Hello, I searched a lot about lucene limits and its performance, but I still don't know how much I can count on it. I'm storing logs and indexing them with lucene. The event per second is 2000. The format of each log is generally 'fieldname' : 'fieldvalue'. What search performance should I expect

Re: Lucene performance in 64 Bit

2012-03-01 Thread Ganesh
Thanks Li Li. Please share your experience in 64 bit. How big your indexes are? Regards Ganesh - Original Message - From: "Li Li" To: Sent: Thursday, March 01, 2012 3:03 PM Subject: Re: Lucene performance in 64 Bit >I think many users of lucene use large memory

Re: Lucene performance in 64 Bit

2012-03-01 Thread Li Li
I think many users of lucene use large memory because 32bit system's memory is too limited(windows 1.5GB, Linux 2-3GB). the only noticable thing is * Compressed* *oops* . some says it's useful, some not. you should give it a try. On Thu, Mar 1, 2012 at 4:59 PM, Ganesh wrote: > Hello all, > > Is

Re: Lucene performance: is search time linear to the index size?

2009-06-19 Thread Joel Halbert
gt; >> Sent: Thursday, June 18, 2009 12:44 AM > >> To: java-user@lucene.apache.org > >> Subject: Re: Lucene performance: is search time linear to the > >> index size? > >> > >> Opening a searcher and doing the first query incurs a > >> significan

RE: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Teruhiko Kurosaka
> From: Jay Booth [mailto:jbo...@wgen.net] > Are you fetching all of the results for your search? No, I'm not doing anything on the search results. This is essentially what I do: searcher = new IndexSearcher(IndexReader.open(indexFileDir)); query = new TermQuery(new Term(fieldNam

Re: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Yonik Seeley
ous clauses of the query. -Yonik http://www.lucidimagination.com > -kuro > >> -Original Message- >> From: Erick Erickson [mailto:erickerick...@gmail.com] >> Sent: Thursday, June 18, 2009 12:44 AM >> To: java-user@lucene.apache.org >> Subject: Re: Lucene p

RE: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Jay Booth
disk, not the search time. -Original Message- From: Teruhiko Kurosaka [mailto:k...@basistech.com] Sent: Thursday, June 18, 2009 2:55 PM To: java-user@lucene.apache.org Subject: RE: Lucene performance: is search time linear to the index size? Erik, The way I test this program is by is

RE: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Teruhiko Kurosaka
ne Document that can matches with a query, the search time remains constant no matter how large the index is. -kuro > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Thursday, June 18, 2009 12:44 AM > To: java-user@lucene.apache.org >

Re: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Erick Erickson
Opening a searcher and doing the first query incurs a significant amount of overhead, cache loading, etc. Inferring search times relative to index size with a program like you describe is unreliable. Try firing a few queries at the index without measuring, *then* measure the time it takes for subs

RE: Lucene performance: is search time linear to the index size?

2009-06-17 Thread Teruhiko Kurosaka
I've written a test program that uses the simplest form of search, TermQuery and measure the time it takes to search a term in a field on indices of various sizes. The result is a very linear growth of search time vs the index size in terms of # of Documents, not # of unique terms in that field.

Re: Lucene performance: is search time linear to the index size?

2009-06-17 Thread Peter Keegan
:erickerick...@gmail.com] > > Sent: Wednesday, June 17, 2009 9:09 AM > > To: java-user@lucene.apache.org > > Subject: Re: Lucene performance: is search time linear to the > > index size? > > > > Are you measuring search time *only* or are you measuring > >

RE: Lucene performance: is search time linear to the index size?

2009-06-17 Thread Teruhiko Kurosaka
June 17, 2009 9:09 AM > To: java-user@lucene.apache.org > Subject: Re: Lucene performance: is search time linear to the > index size? > > Are you measuring search time *only* or are you measuring > total response time including assembling whatever you > assemble? If y

Re: Lucene performance: is search time linear to the index size?

2009-06-17 Thread Erick Erickson
Are you measuring search time *only* or are you measuring total response time including assembling whatever you assemble? If you're measuring total response time, everything from network latency to what you're doing with each hit may affect response time. This is especially true if you're iteratin

Re: Lucene performance: is search time linear to the index size?

2009-06-17 Thread Ian Lea
It depends on lots of things, but the time to execute a search would not typically grow linearly with the number of documents. But the time to retrieve data from all the hits might, if the number of hits is growing in line with the number of documents. Are you doing that by any chance, as opposed

Lucene performance: is search time linear to the index size?

2009-06-17 Thread Teruhiko Kurosaka
I am seeing my Lucene application's search time grows pretty much linearly to the number of Documents. Is this how Lucene is supposed to work, or does it depend on the nature of query? I am not using FuzzyQuery that was the subject of the recent discussion by the way. -Kuro -

Re: Lucene Performance issue

2009-01-21 Thread Anshul jain
@Erick: Yes I changed the default field, it is "bagofwords" now. @Ian: Yes both indexes were optimized, and I didn't do any deletions. version 2.4.0 I'll repeat the experiment, just be sure. Mean while, do you have any document on Lucene fields? what I need to know is how lucene is storing field

Re: Lucene Performance issue

2009-01-21 Thread Ian Lea
> ... > I can for sure say that multiple copies are not index. But the number of > fields in which text is divided are many. Can that be a reason? Not for that amount of difference. You may be sure that you are not indexing multiple copies, but I'm not. Convince me - create 2 new indexes via the

Re: Lucene Performance issue

2009-01-21 Thread Erick Erickson
Note that your two queries are different unless you've changed the default operator. Also, your bagOfWords query is searching across your default field for the second two terms. Your bagOfWords is really something like bagOfWords:Alexander OR :history OR :Macedon. Best Erick On Wed, Jan 21, 20

Re: Lucene Performance issue

2009-01-21 Thread Erick Erickson
I agree with Ian that these times sound way too high. I'd also ask whether you fire a few warmup searches at your server before measuring the increased time, you might just be seeing the cache being populated. Best Erick On Wed, Jan 21, 2009 at 10:42 AM, Ian Lea wrote: > Hi > > > Space: 700Mb v

Re: Lucene Performance issue

2009-01-21 Thread Anshul jain
Hi, thanks for the reply. For the document, in my last mail.. multifieldQuery: name: Alexander AND domain: history AND first_sentence: Macedon Single field query: bagOfWords: Alexander history Macedon I can for sure say that multiple copies are not index. But the number of fields in which text

Re: Lucene Performance issue

2009-01-21 Thread Ian Lea
Hi Space: 700Mb vs 4.5Gb sounds way too big a difference. Are you sure you aren't loading multiple copies of the data or something like that? Queries: a 20 times slowdown for a multi field query also sounds way too big. What do the simple and multi field queries look like? -- Ian. On Wed,

Lucene Performance issue

2009-01-21 Thread Anshul jain
Hi, I've indexed around half a million XML documents. Here is the document sample: cogito:Name Alexander the Great cogito:domain ancient history cogito:first_sentence Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander III

Re: Lucene performance issues..

2008-07-28 Thread Michael McCandless
:59pm To: java-user@lucene.apache.org Subject: Re: Lucene performance issues.. On Sonntag, 27. Juli 2008, Mazhar Lateef wrote: We have also tried upgrading the lucene version to 2.3 in hope to improve performance but the results were quite the opposite. but from my research on the internet the Lucene

Re: Lucene performance issues..

2008-07-28 Thread Toke Eskildsen
On Sun, 2008-07-27 at 21:38 +0100, Mazhar Lateef wrote: > * email searching > o We are creating very large indexes for emails we are > processing, the size is upto +150GB for indexes only (not > including data content), this we thought would improve > search

Re: Lucene performance issues..

2008-07-28 Thread ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
o change for a while. > > > -Original Message- > From: "Daniel Naber" <[EMAIL PROTECTED]> > Sent: Sunday, July 27, 2008 4:59pm > To: java-user@lucene.apache.org > Subject: Re: Lucene performance issues.. > > On Sonntag, 27. Juli 2008, Mazhar Lateef wrote:

Re: Lucene performance issues..

2008-07-27 Thread Stu Hood
nt: Sunday, July 27, 2008 4:59pm To: java-user@lucene.apache.org Subject: Re: Lucene performance issues.. On Sonntag, 27. Juli 2008, Mazhar Lateef wrote: > We have also tried upgrading the lucene version to 2.3 in hope to > improve performance but the results were quite the opposite. but fro

Re: Lucene performance issues..

2008-07-27 Thread Daniel Naber
On Sonntag, 27. Juli 2008, Mazhar Lateef wrote: > We have also tried upgrading the lucene version to 2.3 in hope to > improve performance but the results were quite the opposite. but from my > research on the internet the Lucene version 2.3 is much faster and > better so why are we seeing such inc

Lucene performance issues..

2008-07-27 Thread Mazhar Lateef
Hi, we have a system to archive mails and are facing some issues that we are having with search and indexing performance, the following is what we are currently facing challenges with, we are currently using lucene version 2.2 the platform is SLES10.1 and the application is written in Java.

Re: Lucene performance: benchmarktemplate.xml

2008-04-18 Thread Glen Newton
s, :-) > > > > -glen > > > > > Mike > > > > > > > > > Glen Newton wrote: > > > > > > > Cass, > > > > Thanks for converting it. I've posted it to my blog: > > > > > > > > http

Re: Lucene performance: benchmarktemplate.xml

2008-04-17 Thread Anshum
glen > > > Mike > > > > > > Glen Newton wrote: > > > > > Cass, > > > Thanks for converting it. I've posted it to my blog: > > > > > http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html > > >

Re: Lucene performance: benchmarktemplate.xml

2008-04-16 Thread Glen Newton
toCommit=true). I will re-run and post the results! Thanks, :-) -glen > Mike > > > Glen Newton wrote: > > > Cass, > > Thanks for converting it. I've posted it to my blog: > > > http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.ht

Re: Lucene performance: benchmarktemplate.xml

2008-04-16 Thread Michael McCandless
Thanks for converting it. I've posted it to my blog: http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance- benchmarks.html Sorry for the XML tags: I guess I followed the instructions on the Lucene performance benchmarks page to literally ("Post these figures to the lucene-user mai

Re: Lucene performance: benchmarktemplate.xml

2008-04-16 Thread Glen Newton
Cass, Thanks for converting it. I've posted it to my blog: http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html Sorry for the XML tags: I guess I followed the instructions on the Lucene performance benchmarks page to literally ("Post these figures to the l

Re: Lucene performance: benchmarktemplate.xml

2008-04-15 Thread Cass Costello
I just did that so I could read it. :) I'll leave it up until Glen resends or posts it somewhere... http://www.casscostello.com/?page_id=28 On Tue, Apr 15, 2008 at 5:18 PM, Ian Holsman <[EMAIL PROTECTED]> wrote: > Hi Glen. > can you resend this in plain text? > or put the HTML up on a server s

Re: Lucene performance: benchmarktemplate.xml

2008-04-15 Thread Ian Holsman
Hi Glen. can you resend this in plain text? or put the HTML up on a server somewhere and point to it with a brief summary in the post? I'd love to look and read it, all those tags are making me go blind. Glen Newton wrote: Hardware Environment Dedicated machine for indexing: yes CPU: D

Lucene performance: benchmarktemplate.xml

2008-04-15 Thread Glen Newton
Hardware Environment Dedicated machine for indexing: yes CPU: Dual processor dual core Xeon CPU 3.00GHz; hyperthreading ON for 8 virtual cores RAM: 8GB Drive configuration: Dell EMC AX150 storage array fibre channel Software environment Lucene Version: 2.3.1 Java Version: Java(TM)

Lucene performance: benchmarktemplate.xml

2008-04-15 Thread Glen Newton
Hardware Environment Dedicated machine for indexing: yes CPU: Dual processor dual core Xeon CPU 3.00GHz; hyperthreading ON for 8 virtual cores RAM: 8GB Drive configuration: Dell EMC AX150 storage array fibre channel Software environment Lucene Version: 2.3.1 Java Versio

Re: Lucene Performance

2008-01-28 Thread Thibaut Britz
; The query rewrite could in principle do this, but it might affect the > score values. > > Regards, > Paul Elschot > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > >

Re: Lucene Performance

2008-01-19 Thread Paul Elschot
On Friday 18 January 2008 17:52:27 Thibaut Britz wrote: > > Hi, > ... > > Another thing I noticed is that we append a lot of queries, so we have a lot > of duplicate phrases like (A and B or C) and ... and (A and B or C) (more > nested than that). Is lucene doing any internal query optimization

Lucene Performance

2008-01-18 Thread Thibaut Britz
/Lucene-Performance-tp14952958p14952958.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene performance issues

2008-01-05 Thread Andrew Huntwork
Your grinder output seems to indicate clearly that your bottleneck is in your database code, not in lucene. It seems that the threads are all blocked trying to get a connection from a connection pool. Maybe you're leaking connections, or maybe you need to increase the size of the pool. On 1/3/

Re: lucene performance issues

2008-01-04 Thread Otis Gospodnetic
t; To: java-user@lucene.apache.org Sent: Thursday, January 3, 2008 9:40:53 PM Subject: lucene performance issues Folks, We're running into some performance bottle neck issues while running lucene search against our indices (approx 1.5 GB in size after optimization), and the search query seems to block

lucene performance issues

2008-01-03 Thread Oscar Usifer
Folks, We're running into some performance bottle neck issues while running lucene search against our indices (approx 1.5 GB in size after optimization), and the search query seems to block on a sychronized read as follows. Obviously we can upgrade to the latest as a first step. When Tomcat r

Re: Lucene performance using a solid state disk (SSD)

2007-07-28 Thread Otis Gospodnetic
a-user@lucene.apache.org Sent: Saturday, July 28, 2007 5:54:53 AM Subject: Lucene performance using a solid state disk (SSD) Has anyone done any benchmarking of Lucene running with the index stored on a SSD? Given the performance characteristics quoted for, say, the SANDISK devices (eg http://www.sandisk.c

Lucene performance using a solid state disk (SSD)

2007-07-27 Thread Kent Fitch
Has anyone done any benchmarking of Lucene running with the index stored on a SSD? Given the performance characteristics quoted for, say, the SANDISK devices (eg http://www.sandisk.com/OEM/ProductCatalog(1321)-SanDisk_SSD_SATA_5000_25.aspx: 7000 IO/sec for 512 byte requests, 67MB/sec sustained re

Re: Does lucene performance suffer with a lot of empty fields ?

2006-08-01 Thread Chris Hostetter
: >From what I gather, I can go ahead & create an Index & for each Document & : only add the relevant fields. Is this correct? : I should still be able to search with queries like "mel Movies:braveheart". : Right ? : : Would this impact the search performance ? : Any other words of caution for me ?

Re: Does lucene performance suffer with a lot of empty fields ?

2006-08-01 Thread Erick Erickson
I can't speak to performance, but there's no problem having different fields for different documents. Stated differently, you don't need to have all fields in all documents. It took me a while to get my head out of database tables and accept this I doubt there's a problem with speed, but as

Does lucene performance suffer with a lot of empty fields ?

2006-08-01 Thread Mek
I have 1 generic index, but am Indexing a lot of different things, like actors, politicians, scientists, sportsmen. And as you can see that though there are some common fields, like name & DOB, there are also fields for each of these types of people that are different. e.g. Actors will have "Movi

Re: Theoretical Lucene Performance

2006-05-17 Thread Mike Richmond
Hello Andreas, This may also be a good reference for you: http://lucene.apache.org/java/docs/fileformats.html --Mike On 5/16/06, Andreas Harth <[EMAIL PROTECTED]> wrote: Hello, I'd like to learn a bit more about the index organization of Lucene (ideally without sifting through source code).

Re: Theoretical Lucene Performance

2006-05-16 Thread gekkokid
http://lucenebook.com http://www.amazon.com/exec/obidos/asin/1932394281 :) - Original Message - From: "Andreas Harth" <[EMAIL PROTECTED]> To: Sent: Tuesday, May 16, 2006 10:51 PM Subject: Theoretical Lucene Performance Hello, I'd like to learn a bit

Theoretical Lucene Performance

2006-05-16 Thread Andreas Harth
Hello, I'd like to learn a bit more about the index organization of Lucene (ideally without sifting through source code). Are there any publications that explain the Lucene indexing structure in detail? Or is it possible to say in a few sentences how Lucene works and I can look up the details in

RE: Commercial vendors monitoring this ML? was: Lucene Performance Issues

2006-03-28 Thread Runde, Kevin
ical RAM on the box. -Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 28, 2006 12:47 PM To: java-user@lucene.apache.org Subject: Commercial vendors monitoring this ML? was: Lucene Performance Issues Weird, I was just about to comment on the

Commercial vendors monitoring this ML? was: Lucene Performance Issues

2006-03-28 Thread jwang
ue" in the business referring them to me. Jeff Wang diCarta, Inc. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 28, 2006 8:39 AM To: java-user@lucene.apache.org Subject: Re: Lucene Performance Issues Hi Thomas, Sound like FUD to me. No concre

Re: Lucene Performance Issues

2006-03-28 Thread thomasg
I'd rather be developing with open source tools but the project manager thought it good to ask around. Thanks again and will get some benchmarks that will mean more than guesswork. -- View this message in context: http://www.nabble.com/Lucene-Performance-Issues-t1354811.html#a3633569 Sent

Re: Lucene Performance Issues

2006-03-28 Thread Otis Gospodnetic
suggest you try both and see which one suits your needs. Otis - Original Message From: thomasg <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, March 28, 2006 5:06:54 AM Subject: Lucene Performance Issues Hi, we are currently intending to implement a document

Re: Lucene Performance Issues

2006-03-28 Thread Doug Cutting
thomasg wrote: Hi, we are currently intending to implement a document storage / search tool using Jackrabbit and Lucene. We have been approached by a commercial search and indexing organisation called ISYS who are suggesting the following problems with using Lucene. We do have a requirement to st

Re: Lucene Performance Issues

2006-03-28 Thread Eric Jain
thomasg wrote: 1) By default, Lucene only indexes the first 10,000 words from each document. When increasing this default out-of-memory errors can occur. This implies that documents, or large sections thereof, are loaded into memory. ISYS has a very small memory footprint which is not affected by

Lucene Performance Issues

2006-03-28 Thread thomasg
ge in context: http://www.nabble.com/Lucene-Performance-Issues-t1354811.html#a3626992 Sent from the Lucene - Java Users forum at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene performance question

2006-03-09 Thread DanielFeinstein
I'm using the following java options: JAVA_OPTS='-Xmx1524m -Xms1524m -Djava.awt.headless=true' --- Grant Ingersoll <[EMAIL PROTECTED]> wrote: > What is your Java max heap size set to? This is the > -Xmx Java option. > > Daniel Feinstein wrote: > > Hi, > > > > My lucene index is not big (about

Re: Lucene performance question

2006-03-09 Thread Grant Ingersoll
What is your Java max heap size set to? This is the -Xmx Java option. Daniel Feinstein wrote: Hi, My lucene index is not big (about 150M). My computer has 2G RAM but for some reason when I'm trying to store my index using org.apache.lucene.store.RAMDirectory it fails with java out of memory

Lucene performance question

2006-03-09 Thread Daniel Feinstein
Hi, My lucene index is not big (about 150M). My computer has 2G RAM but for some reason when I'm trying to store my index using org.apache.lucene.store.RAMDirectory it fails with java out of memory exception. Also sometimes for the same search query time spent on search could raise in 10-20 tim

Re: Lucene performance bottlenecks

2005-12-12 Thread Chris Hostetter
: Oh, BTW: I just found the DisjunctionMaxQuery class, recently added it : seems. Do you think this query structure could benefit from using it : instead of the BooleanQuery? DisjunctionMaxQuery kicks ass (in my opinion), and It certainly seems like (from your query structure) it's something you

Re: Lucene performance bottlenecks

2005-12-12 Thread Andrzej Bialecki
Paul Elschot wrote: There is one indexing parameter that might help performance for BooleanScorer2, it is the skip interval in Lucene's TermInfosWriter. The current value is 16, and there was a question about it on 16 Oct 2005 on java-dev with title "skipInterval". I don't know how the value of

Re: Lucene performance bottlenecks

2005-12-11 Thread Paul Elschot
On Wednesday 07 December 2005 10:51, Andrzej Bialecki wrote: > Paul Elschot wrote: > >On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote: > >>Paul Elschot wrote: > >> ... > >>>This is one of the cases in which BooleanScorer2 can be faster > >>>than the 1.4 BooleanScorer because the 1.4 Boo

RE: Lucene performance bottlenecks

2005-12-08 Thread Dalton, Jeffery
Andrzej, I think you did a great job elucidating my thoughts as well. I heartily concur with everything you said. Andrzej Bialecki Wrote: > Hmm... Please define what "adequate" means. :-) IMHO, > "adequate" is when for any query the response time is well > below 1 second. Otherwise the serv

Re: Lucene performance bottlenecks

2005-12-08 Thread Andrzej Bialecki
(Moving the discussion to nutch-dev, please drop the cc: when responding) Doug Cutting wrote: Andrzej Bialecki wrote: It's nice to have these couple percent... however, it doesn't solve the main problem; I need 50 or more percent increase... :-) and I suspect this can be achieved only by som

Re: Lucene performance bottlenecks

2005-12-07 Thread Doug Cutting
Andrzej Bialecki wrote: It's nice to have these couple percent... however, it doesn't solve the main problem; I need 50 or more percent increase... :-) and I suspect this can be achieved only by some radical changes in the way Nutch uses Lucene. It seems the default query structure is too compl

Re: Lucene performance bottlenecks

2005-12-07 Thread Andrzej Bialecki
Yonik Seeley wrote: if (b>0) return b; Doing an 'and' of two bytes and checking if the result is 0 probably requires masking operations on >8 bit processors... Sometimes you can get a peek into how a JVM would optimize things by looking at the asm output of the code from a C compiler. Bot

Re: Lucene performance bottlenecks

2005-12-07 Thread Doug Cutting
Paul Elschot wrote: Querying the host field like this in a web page index can be dangerous business. For example when term1 is "wikipedia" and term2 is "org", the query will match at least all pages from wikipedia.org. Note that if you search for wikipedia.org in Nutch this is interpreted as a

Re: Lucene performance bottlenecks

2005-12-07 Thread Yonik Seeley
> if (b>0) return b; > Doing an 'and' of two bytes and checking if the result is 0 probably > requires masking operations on >8 bit processors... Sometimes you can get a peek into how a JVM would optimize things by looking at the asm output of the code from a C compiler. Both (b>=0) and ((b&0x80)!

Re: Lucene performance bottlenecks

2005-12-07 Thread Yonik Seeley
On 12/7/05, Vanlerberghe, Luc <[EMAIL PROTECTED]> wrote: > Since 'byte' is signed in Java, can't the first test be simply written > as > if (b>0) return b; > Doing an 'and' of two bytes and checking if the result is 0 probably > requires masking operations on >8 bit processors... Yep, that was my

RE: Lucene performance bottlenecks

2005-12-07 Thread Vanlerberghe, Luc
all operators use int's... Luc -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: woensdag 7 december 2005 16:11 To: java-user@lucene.apache.org Subject: Re: Lucene performance bottlenecks I checked out readVInt() to see if I could optimize it any... For a random

Re: Lucene performance bottlenecks

2005-12-07 Thread Yonik Seeley
I checked out readVInt() to see if I could optimize it any... For a random distribution of integers <200 I was able to speed it up a little bit, but nothing to write home about: old newpercent Java14-client : 13547 12468 8% Java14-server: 6047 5266 14% Java1

Re: Lucene performance bottlenecks

2005-12-07 Thread Andrzej Bialecki
Paul Elschot wrote: On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote: Paul Elschot wrote: In somewhat more readable layout: +(url:term1^4.0 anchor:term1^2.0 content:term1 title:term1^1.5 host:term1^2.0) +(url:term2^4.0 anchor:term2^2.0 content:term2 title:term2^1.5 host:

Re: Lucene performance bottlenecks

2005-12-03 Thread Paul Elschot
On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote: > Paul Elschot wrote: > > >In somewhat more readable layout: > > > >+(url:term1^4.0 anchor:term1^2.0 content:term1 > > title:term1^1.5 host:term1^2.0) > >+(url:term2^4.0 anchor:term2^2.0 content:term2 > > title:term2^1.5 host:term2^

Re: Lucene performance bottlenecks

2005-12-03 Thread Andrzej Bialecki
Paul Elschot wrote: In somewhat more readable layout: +(url:term1^4.0 anchor:term1^2.0 content:term1 title:term1^1.5 host:term1^2.0) +(url:term2^4.0 anchor:term2^2.0 content:term2 title:term2^1.5 host:term2^2.0) url:"term1 term2"~2147483647^4.0 anchor:"term1 term2"~4^2.0 content:"term1 t

Re: Lucene performance bottlenecks

2005-12-03 Thread Andrzej Bialecki
Doug Cutting wrote: Andrzej Bialecki wrote: For a simple TermQuery, if the DF(term) is above 10%, the response time from IndexSearcher.search() is around 400ms (repeatable, after warm-up). For such complex phrase queries the response time is around 1 sec or more (again, after warm-up). Ar

Re: Lucene performance bottlenecks

2005-12-02 Thread Doug Cutting
Andrzej Bialecki wrote: For a simple TermQuery, if the DF(term) is above 10%, the response time from IndexSearcher.search() is around 400ms (repeatable, after warm-up). For such complex phrase queries the response time is around 1 sec or more (again, after warm-up). Are you specifying -server

Re: Lucene performance bottlenecks

2005-12-02 Thread Paul Elschot
Andrzej, On Friday 02 December 2005 12:55, Andrzej Bialecki wrote: > Hi, > > I'm doing some performance profiling of a Nutch installation, working > with relatively large individual indexes (10 mln docs), and I'm puzzled > with the results. > > Here's the listing of the index: > -rw-r--r-- 1

Lucene performance bottlenecks

2005-12-02 Thread Andrzej Bialecki
Hi, I'm doing some performance profiling of a Nutch installation, working with relatively large individual indexes (10 mln docs), and I'm puzzled with the results. Here's the listing of the index: -rw-r--r-- 1 andrzej andrzej 9803100 Dec 2 05:24 _0.f0 -rw-r--r-- 1 andrzej andrzej 9