Re: Throughput doesn't increase when using more concurrent threads

2006-04-05 Thread Peter Keegan
> Out of interest, does indexing time speed up much on 64-bit hardware? I was able to speed up indexing on 64-bit platform by taking advantage of the larger address space to parallelize the indexing process. One thread creates index segments with a set of RAMDirectories and another thread merges t

Re: Throughput doesn't increase when using more concurrent threads

2006-03-17 Thread Peter Keegan
Good question. 'Top' reports the jvm at 99.9% CPU, but the individual CPUs (top/1) don't seem to add up to 99.9. This server is actually 2 - 8 CPU servers whose backplanes are cabled together, so there may be some issue here. The network load is heavy, but doesn't seem to be the bottleneck (on the

Re: Throughput doesn't increase when using more concurrent threads

2006-03-17 Thread Doug Cutting
Peter Keegan wrote: I did some additional testing with Chris's patch and mine (based on Doug's note) vs. no patch and found that all 3 produced the same throughput - about 330 qps - over a longer period. Was CPU utilizaton 100%? If not, where do you think the bottleneck now is? Network? Or

Re: Throughput doesn't increase when using more concurrent threads

2006-03-17 Thread Peter Keegan
I did some additional testing with Chris's patch and mine (based on Doug's note) vs. no patch and found that all 3 produced the same throughput - about 330 qps - over a longer period. So, there seems to be a point of diminishing returns to adding more cpus. The dual core Opterons (8 cpu) still win

Re: Throughput doesn't increase when using more concurrent threads

2006-03-13 Thread Peter Keegan
Chris, My apologies - this error was apparently caused by a file format mismatch (probably line endings). Thanks, Peter On 3/13/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > Chris, > > Should this patch work against the current code base? I'm getting this > error: > > D:\lucene-1.9>patch -b -p0

Re: Throughput doesn't increase when using more concurrent threads

2006-03-13 Thread Peter Keegan
Chris, Should this patch work against the current code base? I'm getting this error: D:\lucene-1.9>patch -b -p0 -i nio-lucene-1.9.patch patching file src/java/org/apache/lucene/index/CompoundFileReader.java patching file src/java/org/apache/lucene/index/FieldsReader.java missing header for unifie

Re: Throughput doesn't increase when using more concurrent threads

2006-03-10 Thread Chris Lamprecht
Peter, I think this is similar to the patch in this bugzilla task: http://issues.apache.org/bugzilla/show_bug.cgi?id=35838 the patch itself is http://issues.apache.org/bugzilla/attachment.cgi?id=15757 (BTW does JIRA have a way to display the patch diffs?) The above patch also has a change to Se

Re: Throughput doesn't increase when using more concurrent threads

2006-03-10 Thread Peter Keegan
> 3. Use the ThreadLocal's FieldReader in the document() method. As I understand it, this means that the document method no longer needs to be synchronized, right? I've made these changes and it does appear to improve performance. Random snapshots of the stack traces show only an occasional lock

Re: Throughput doesn't increase when using more concurrent threads

2006-03-07 Thread Doug Cutting
Peter Keegan wrote: I ran a query performance tester against 8-cpu and 16-cpu Xeon servers (16/32 cpu hyperthreaded). on Linux. Here are the results: 8-cpu: 275 qps 16-cpu: 305 qps (the dual-core Opteron servers are still faster) Here is the stack trace of 8 of the 16 query threads during the

Re: Throughput doesn't increase when using more concurrent threads

2006-03-07 Thread Peter Keegan
I ran a query performance tester against 8-cpu and 16-cpu Xeon servers (16/32 cpu hyperthreaded). on Linux. Here are the results: 8-cpu: 275 qps 16-cpu: 305 qps (the dual-core Opteron servers are still faster) Here is the stack trace of 8 of the 16 query threads during the test: at org.

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Peter Keegan
Yonik, We're investigating both approaches. Yes, the resources (and permutations) are dizzying! Peter On 2/23/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > Wow, some resources! > Would it be cheaper / more scalable to copy the index to multiple > boxes and loadbalance requests across them? > >

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Yonik Seeley
Wow, some resources! Would it be cheaper / more scalable to copy the index to multiple boxes and loadbalance requests across them? -Yonik On 2/23/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next > (32 with hyperthreading), o

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Peter Keegan
Chris, I tried JRockit a while back on 8-cpu/windows and it was slower than Sun's. Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next (32 with hyperthreading), on LinTel. I may give JRockit another go around then. Thanks, Peter On 2/23/06, Chris Lamprecht <[EMAIL PROTECT

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Chris Lamprecht
Peter, Have you given JRockit JVM a try? I've seen it help throughput compared to Sun's JVM on a dual xeon/linux machine, especially with concurrency (up to 6 concurrent searches happening). I'm curious to see if it makes a difference for you. -chris On 2/23/06, Peter Keegan <[EMAIL PROTECTED]>

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Peter Keegan
We discovered that the kernel was only using 8 CPUs. After recompiling for 16 (8+hyperthreads), it looks like the query rate will settle in around 280-300 qps. Much better, although still quite a bit slower than the opteron. Peter On 2/22/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > Hmmm, n

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Raghavendra Prabhu
<[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Thursday, February 23, 2006 11:10:11 AM > Subject: Re: Throughput doesn't increase when using more concurrent > threads > > Can nutch be made to use lucene query parser? > > Rgds > Prabhu > > > On

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Otis Gospodnetic
ED]> To: java-user@lucene.apache.org Sent: Thursday, February 23, 2006 11:10:11 AM Subject: Re: Throughput doesn't increase when using more concurrent threads Can nutch be made to use lucene query parser? Rgds Prabhu On 2/23/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > Hi Otis, >

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Dan Armbrust
I would give the IBM or blackdown JVM a try on linux - I've seen pretty wide variance in their speed on different operations. Sometimes better than Sun, sometimes worse - it depended on the task (I did some adhoc tests at one point that showed sun was faster for indexing, but IBM was faster fo

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Raghavendra Prabhu
Can nutch be made to use lucene query parser? Rgds Prabhu On 2/23/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > Hi Otis, > > The Lucene server is actually CPU and network bound, as the index gets > memory mapped pretty quickly. There is little disk activity observed. > > I was also able to run

Re: Throughput doesn't increase when using more concurrent threads

2006-02-23 Thread Peter Keegan
Hi Otis, The Lucene server is actually CPU and network bound, as the index gets memory mapped pretty quickly. There is little disk activity observed. I was also able to run the server on a Sun box last night with 4 dual core opterons (same Linux and JVM) and I'm observing query rates of 400 qps!

Re: Throughput doesn't increase when using more concurrent threads

2006-02-22 Thread Otis Gospodnetic
Hi, Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) --- I would also play with disk IO schedulers, if you can. CentOS is based on RedHat, I believe, and RedHat (ext3, really) now has about 4 different IO schedulers that, according to ar

Re: Throughput doesn't increase when using more concurrent threads

2006-02-22 Thread Yonik Seeley
Hmmm, not sure what that could be. You could try using the default FSDir instead of MMapDir to see if the differences are there. Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) - synchronization workings - page replacement policy... how to

Re: Throughput doesn't increase when using more concurrent threads

2006-02-22 Thread Peter Keegan
I am doing a performance comparison of Lucene on Linux vs Windows. I have 2 identically configured servers (8-CPUs (real) x 3GHz Xeon processors, 64GB RAM). One is running CentOS 4 Linux, the other is running Windows server 2003 Enterprise Edition x64. Both have 64-bit JVMs from Sun. The Lucene se

Re: Throughput doesn't increase when using more concurrent threads

2006-01-30 Thread Peter Keegan
I cranked up the dial on my query tester and was able to get the rate up to 325 qps. Unfortunately, the machine died shortly thereafter (memory errors :-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit indexing speed, yet. Peter On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote

Re: Throughput doesn't increase when using more concurrent threads

2006-01-29 Thread Daniel Noll
Yonik Seeley wrote: On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote: Peter Keegan wrote: I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. Is this true about the 64-bit JVM not

Re: Throughput doesn't increase when using more concurrent threads

2006-01-29 Thread Yonik Seeley
On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote: > Peter Keegan wrote: > > I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on > > Intel. If you know of any, please let me know. Linux may be an option, too. > > > Is this true about the 64-bit JVM not working on Intel? Go ba

Re: Throughput doesn't increase when using more concurrent threads

2006-01-29 Thread Daniel Noll
Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does in

Re: Throughput doesn't increase when using more concurrent threads

2006-01-29 Thread Daniel Noll
Peter Keegan wrote: I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. Is this true about the 64-bit JVM not working on Intel? I was under the impression that it supported the AMD64 instruction

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Ray, The 135 qps rate was using the standard FSDirectory in 1.9. Peter On 1/26/06, Ray Tsang <[EMAIL PROTECTED]> wrote: > > Paul, > > Thanks for the advice! But for the 100+queries/sec on a 32-bit > platfrom, did you end up applying other patches? or use different > FSDirectory implementations?

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Paul, Thanks for the advice! But for the 100+queries/sec on a 32-bit platfrom, did you end up applying other patches? or use different FSDirectory implementations? Thanks! ray, On 1/27/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Ray, > > The short answer is that you can make Lucene blazingly

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Ray, The short answer is that you can make Lucene blazingly fast by using advice and design principles mentioned in this forum and of course reading 'Lucene in Action'. For example, use a 'content' field for searching all fields (vs mutli-field search), put all your stored data in one field, under

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Peter, Wow, the speed up in impressive! But may I ask what did you do to achieve 135 queries/sec prior to the JVM swich? ray, On 1/27/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Correction: make that 285 qps :) > > On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > > > I tried the AMD64-b

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Yonik Seeley
There is no difference in bytecode... the whole difference is just in the underlying JVM. -Yonik On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Dumb question: does the 64-bit compiler (javac) generate different code than > the 32-bit version, or is it just the jvm that matters? My reported

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Dumb question: does the 64-bit compiler (javac) generate different code than the 32-bit version, or is it just the jvm that matters? My reported speedups were soley from using the 64-bit jvm with jar files from the 32-bit compiler. Peter On 1/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > Ni

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Yonik Seeley
Nice speedup! The extra registers in 64 bit mode hay have helped a little too. -Yonik On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Correction: make that 285 qps :) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additi

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Correction: make that 285 qps :) On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now > getting 250 queries/sec and excellent cpu utilization (equal concurrency on > all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Thanks all very much. Peter On 1/26/06, Doug Cutting <[EMAIL PROTEC

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Doug Cutting
Doug Cutting wrote: A 64-bit JVM with NioDirectory would really be optimal for this. Oops. I meant MMapDirectory, not NioDirectory. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PRO

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Doug Cutting
Peter Keegan wrote: The throughput is worse with NioFSDIrectory than with the FSDIrectory (patched and unpatched). The bottleneck still seems to be synchronization, this time in NioFile.getChannel (7 of the 8 threads were blocked there during one snapshot). I tried this with 4 and 8 channels.

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Yonik Seeley
BEA Jrockit supports both AMD64 and Intel's EM64T (basically renamed AMD64) http://www.bea.com/framework.jsp?CNT=index.htm&FP=/content/products/jrockit/ and Sun's Java 1.5 for "Windows AMD64 Platform" They advertize AMD64, presumably because that's what there servers use, but it should work on Int

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. btw, I'm getting a sustained rate of 135 queries/sec with 4 threads, which is pretty impressive. Another way around the concurrency limit is to run

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Yonik Seeley
Hmmm, can you run the 64 bit version of Windows (and hence a 64 bit JVM?) We're running with heap sizes up to 8GB (RH Linux 64 bit, Opterons, Sun Java 1.5) -Yonik On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Paul, > > I tried this but it ran out of memory trying to read the 500Mb .fdt fi

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Ray, The throughput is worse with NioFSDIrectory than with the FSDIrectory (patched and unpatched). The bottleneck still seems to be synchronization, this time in NioFile.getChannel (7 of the 8 threads were blocked there during one snapshot). I tried this with 4 and 8 channels. The throughput wi

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Paul, I tried this but it ran out of memory trying to read the 500Mb .fdt file. I tried various values for MAX_BBUF, but it still ran out of memory (I'm using -Xmx1600M, which is the jvm's maximum value (v1.5)) I'll give NioFSDirectory a try. Thanks, Peter On 1/26/06, Paul Elschot <[EMAIL PROT

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Speaking of NioFSDirectory, I thought there was one posted a while ago, is this something that can be used? http://issues.apache.org/jira/browse/LUCENE-414 ray, On 11/22/05, Doug Cutting <[EMAIL PROTECTED]> wrote: > Jay Booth wrote: > > I had a similar problem with threading, the problem turned o

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Paul Elschot
On Wednesday 25 January 2006 20:51, Peter Keegan wrote: > The index is non-compound format and optimized. Yes, I did try > MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors) > > Peter > You could also give this a try: http://issues.apache.org/jira/browse/LUCENE-283 Regards

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Peter Keegan
Yes, it's hyperthreaded (16 cpus show up in task manager - the box is running 2003). I plan to turn off hyperthreading to see if it has any effect. Peter On 1/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > It's a 3GHz Intel box with Xeo

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Yonik Seeley
On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > It's a 3GHz Intel box with Xeon processors, 64GB ram :) Nice! Xeon processors are normally hyperthreaded. On a linux box, if you cat /proc/cpuinfo, you will see 8 processors for a 4 physical CPU system. Are you positive you have 8 physical X

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Peter Keegan
The index is non-compound format and optimized. Yes, I did try MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors) Peter On 1/25/06, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Peter Keegan wrote: > > This is just fyi - in my stress tests on a 8-cpu box (that's 8 real > cpus

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Peter Keegan
It's a 3GHz Intel box with Xeon processors, 64GB ram :) Peter On 1/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > Thanks Peter, that's useful info. > > Just out of curiosity, what kind of box is this? what CPUs? > > -Yonik > > On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > This is

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Doug Cutting
Peter Keegan wrote: This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), the maximum throughput occurred with just 4 query threads. The query throughput decreased with fewer than 4 or greater than 4 query threads. The entire index was most likely in the file system cache, t

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Yonik Seeley
Thanks Peter, that's useful info. Just out of curiosity, what kind of box is this? what CPUs? -Yonik On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), > the maximum throughput occurred with just 4 query threads. The

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Peter Keegan
This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), the maximum throughput occurred with just 4 query threads. The query throughput decreased with fewer than 4 or greater than 4 query threads. The entire index was most likely in the file system cache, too. Periodic snapshots

Re: Throughput doesn't increase when using more concurrent threads

2005-11-22 Thread Oren Shir
Hi, There are two sunchronization points: on the stream and on the reader. Using different FSDirectoriy and IndexReaders should solve this. I'll let you know once I code it. Right now I'm checking if making my Documents store less data will move the bottleneck to some other place. Thanks again, O

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Doug Cutting
Jay Booth wrote: I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused t

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Oren Shir
:[EMAIL PROTECTED] > Sent: Monday, November 21, 2005 11:08 AM > To: java-user@lucene.apache.org; [EMAIL PROTECTED] > Subject: Re: Throughput doesn't increase when using more concurrent > threads > > > On 11/21/05, Oren Shir <[EMAIL PROTECTED]> wrote: > > It is rather sa

RE: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Jay Booth
ovember 21, 2005 11:08 AM To: java-user@lucene.apache.org; [EMAIL PROTECTED] Subject: Re: Throughput doesn't increase when using more concurrent threads On 11/21/05, Oren Shir <[EMAIL PROTECTED]> wrote: > It is rather sad if 10 threads reach the CPU limit. I'll check it and g

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Yonik Seeley
On 11/21/05, Oren Shir <[EMAIL PROTECTED]> wrote: > It is rather sad if 10 threads reach the CPU limit. I'll check it and get > back to you. It's about performance and throughput though, not about number of threads it takes to reach saturation. In a 2 CPU box, I would say that the ideal situation

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Oren Shir
gekkokid, does 1.4.3 benefit from multi-threading? > Sorry for not being clear. My tests show that both version does not benefit from multi threading, but it is possible that I'm CPU bound, as Yonik kindly reminded me. is 1.9 the version in the source repository? 1.9 is the version in source re

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Yonik Seeley
This is expected behavior: you are probably quickly becoming CPU bound (which isn't a bad thing). More threads only help when some threads are waiting on IO, or if you actually have a lot of CPUs in the box. -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/21/05, Oren Shir <[EMAIL

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread gekkokid
Oren Shir wrote: I tested this in version 1.4.3 and 1.9rc1, and they are both the same in this aspect. 1.9rc1 is faster, but does not benefit from multi threading. some newbie questions i have, does 1.4.3 benefit from multi-threading? is 1.9 the version in the source repository? _gk ---