> Out of interest, does indexing time speed up much on 64-bit hardware?
I was able to speed up indexing on 64-bit platform by taking advantage of
the larger address space to parallelize the indexing process. One thread
creates index segments with a set of RAMDirectories and another thread
merges t
Good question. 'Top' reports the jvm at 99.9% CPU, but the individual CPUs
(top/1) don't seem to add up to 99.9. This server is actually 2 - 8 CPU
servers whose backplanes are cabled together, so there may be some issue
here. The network load is heavy, but doesn't seem to be the bottleneck (on
the
Peter Keegan wrote:
I did some additional testing with Chris's patch and mine (based on Doug's
note) vs. no patch and found that all 3 produced the same throughput - about
330 qps - over a longer period.
Was CPU utilizaton 100%? If not, where do you think the bottleneck now
is? Network? Or
I did some additional testing with Chris's patch and mine (based on Doug's
note) vs. no patch and found that all 3 produced the same throughput - about
330 qps - over a longer period. So, there seems to be a point of diminishing
returns to adding more cpus. The dual core Opterons (8 cpu) still win
Chris,
My apologies - this error was apparently caused by a file format mismatch
(probably line endings).
Thanks,
Peter
On 3/13/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
>
> Chris,
>
> Should this patch work against the current code base? I'm getting this
> error:
>
> D:\lucene-1.9>patch -b -p0
Chris,
Should this patch work against the current code base? I'm getting this
error:
D:\lucene-1.9>patch -b -p0 -i nio-lucene-1.9.patch
patching file src/java/org/apache/lucene/index/CompoundFileReader.java
patching file src/java/org/apache/lucene/index/FieldsReader.java
missing header for unifie
Peter,
I think this is similar to the patch in this bugzilla task:
http://issues.apache.org/bugzilla/show_bug.cgi?id=35838
the patch itself is http://issues.apache.org/bugzilla/attachment.cgi?id=15757
(BTW does JIRA have a way to display the patch diffs?)
The above patch also has a change to Se
> 3. Use the ThreadLocal's FieldReader in the document() method.
As I understand it, this means that the document method no longer needs to
be synchronized, right?
I've made these changes and it does appear to improve performance. Random
snapshots of the stack traces show only an occasional lock
Peter Keegan wrote:
I ran a query performance tester against 8-cpu and 16-cpu Xeon servers
(16/32 cpu hyperthreaded). on Linux. Here are the results:
8-cpu: 275 qps
16-cpu: 305 qps
(the dual-core Opteron servers are still faster)
Here is the stack trace of 8 of the 16 query threads during the
I ran a query performance tester against 8-cpu and 16-cpu Xeon servers
(16/32 cpu hyperthreaded). on Linux. Here are the results:
8-cpu: 275 qps
16-cpu: 305 qps
(the dual-core Opteron servers are still faster)
Here is the stack trace of 8 of the 16 query threads during the test:
at org.
Yonik,
We're investigating both approaches.
Yes, the resources (and permutations) are dizzying!
Peter
On 2/23/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> Wow, some resources!
> Would it be cheaper / more scalable to copy the index to multiple
> boxes and loadbalance requests across them?
>
>
Wow, some resources!
Would it be cheaper / more scalable to copy the index to multiple
boxes and loadbalance requests across them?
-Yonik
On 2/23/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next
> (32 with hyperthreading), o
Chris,
I tried JRockit a while back on 8-cpu/windows and it was slower than Sun's.
Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next
(32 with hyperthreading), on LinTel. I may give JRockit another go around
then.
Thanks,
Peter
On 2/23/06, Chris Lamprecht <[EMAIL PROTECT
Peter,
Have you given JRockit JVM a try? I've seen it help throughput
compared to Sun's JVM on a dual xeon/linux machine, especially with
concurrency (up to 6 concurrent searches happening). I'm curious to
see if it makes a difference for you.
-chris
On 2/23/06, Peter Keegan <[EMAIL PROTECTED]>
We discovered that the kernel was only using 8 CPUs. After recompiling for
16 (8+hyperthreads), it looks like the query rate will settle in around
280-300 qps. Much better, although still quite a bit slower than the
opteron.
Peter
On 2/22/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> Hmmm, n
<[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Thursday, February 23, 2006 11:10:11 AM
> Subject: Re: Throughput doesn't increase when using more concurrent
> threads
>
> Can nutch be made to use lucene query parser?
>
> Rgds
> Prabhu
>
>
> On
ED]>
To: java-user@lucene.apache.org
Sent: Thursday, February 23, 2006 11:10:11 AM
Subject: Re: Throughput doesn't increase when using more concurrent threads
Can nutch be made to use lucene query parser?
Rgds
Prabhu
On 2/23/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
>
> Hi Otis,
>
I would give the IBM or blackdown JVM a try on linux - I've seen pretty
wide variance in their speed on different operations.
Sometimes better than Sun, sometimes worse - it depended on the task (I
did some adhoc tests at one point that showed sun was faster for
indexing, but IBM was faster fo
Can nutch be made to use lucene query parser?
Rgds
Prabhu
On 2/23/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
>
> Hi Otis,
>
> The Lucene server is actually CPU and network bound, as the index gets
> memory mapped pretty quickly. There is little disk activity observed.
>
> I was also able to run
Hi Otis,
The Lucene server is actually CPU and network bound, as the index gets
memory mapped pretty quickly. There is little disk activity observed.
I was also able to run the server on a Sun box last night with 4 dual core
opterons (same Linux and JVM) and I'm observing query rates of 400 qps!
Hi,
Some things that could be different:
- thread scheduling (shouldn't make too much of a difference though)
--- I would also play with disk IO schedulers, if you can. CentOS is based on
RedHat, I believe, and RedHat (ext3, really) now has about 4 different IO
schedulers that, according to ar
Hmmm, not sure what that could be.
You could try using the default FSDir instead of MMapDir to see if the
differences are there.
Some things that could be different:
- thread scheduling (shouldn't make too much of a difference though)
- synchronization workings
- page replacement policy... how to
I am doing a performance comparison of Lucene on Linux vs Windows.
I have 2 identically configured servers (8-CPUs (real) x 3GHz Xeon
processors, 64GB RAM). One is running CentOS 4 Linux, the other is running
Windows server 2003 Enterprise Edition x64. Both have 64-bit JVMs from Sun.
The Lucene se
I cranked up the dial on my query tester and was able to get the rate up to
325 qps. Unfortunately, the machine died shortly thereafter (memory errors
:-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit
indexing speed, yet.
Peter
On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote
Yonik Seeley wrote:
On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote:
Peter Keegan wrote:
I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on
Intel. If you know of any, please let me know. Linux may be an option, too.
Is this true about the 64-bit JVM not
On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote:
> Peter Keegan wrote:
> > I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on
> > Intel. If you know of any, please let me know. Linux may be an option, too.
> >
> Is this true about the 64-bit JVM not working on Intel?
Go ba
Peter Keegan wrote:
I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now
getting 250 queries/sec and excellent cpu utilization (equal concurrency on
all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware
of it.
Wow. That's fast.
Out of interest, does in
Peter Keegan wrote:
I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on
Intel. If you know of any, please let me know. Linux may be an option, too.
Is this true about the 64-bit JVM not working on Intel? I was under the
impression that it supported the AMD64 instruction
Ray,
The 135 qps rate was using the standard FSDirectory in 1.9.
Peter
On 1/26/06, Ray Tsang <[EMAIL PROTECTED]> wrote:
>
> Paul,
>
> Thanks for the advice! But for the 100+queries/sec on a 32-bit
> platfrom, did you end up applying other patches? or use different
> FSDirectory implementations?
Paul,
Thanks for the advice! But for the 100+queries/sec on a 32-bit
platfrom, did you end up applying other patches? or use different
FSDirectory implementations?
Thanks!
ray,
On 1/27/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> Ray,
>
> The short answer is that you can make Lucene blazingly
Ray,
The short answer is that you can make Lucene blazingly fast by using advice
and design principles mentioned in this forum and of course reading 'Lucene
in Action'. For example, use a 'content' field for searching all fields (vs
mutli-field search), put all your stored data in one field, under
Peter,
Wow, the speed up in impressive! But may I ask what did you do to
achieve 135 queries/sec prior to the JVM swich?
ray,
On 1/27/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> Correction: make that 285 qps :)
>
> On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> >
> > I tried the AMD64-b
There is no difference in bytecode... the whole difference is just in
the underlying JVM.
-Yonik
On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> Dumb question: does the 64-bit compiler (javac) generate different code than
> the 32-bit version, or is it just the jvm that matters? My reported
Dumb question: does the 64-bit compiler (javac) generate different code than
the 32-bit version, or is it just the jvm that matters? My reported speedups
were soley from using the 64-bit jvm with jar files from the 32-bit
compiler.
Peter
On 1/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> Ni
Nice speedup! The extra registers in 64 bit mode hay have helped a little too.
-Yonik
On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> Correction: make that 285 qps :)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additi
Correction: make that 285 qps :)
On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
>
> I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now
> getting 250 queries/sec and excellent cpu utilization (equal concurrency on
> all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm
I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now
getting 250 queries/sec and excellent cpu utilization (equal concurrency on
all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware
of it.
Thanks all very much.
Peter
On 1/26/06, Doug Cutting <[EMAIL PROTEC
Doug Cutting wrote:
A 64-bit JVM with NioDirectory would really be optimal for this.
Oops. I meant MMapDirectory, not NioDirectory.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PRO
Peter Keegan wrote:
The throughput is worse with NioFSDIrectory than with the FSDIrectory
(patched and unpatched). The bottleneck still seems to be synchronization,
this time in NioFile.getChannel (7 of the 8 threads were blocked there
during one snapshot). I tried this with 4 and 8 channels.
BEA Jrockit supports both AMD64 and Intel's EM64T (basically renamed AMD64)
http://www.bea.com/framework.jsp?CNT=index.htm&FP=/content/products/jrockit/
and Sun's Java 1.5 for "Windows AMD64 Platform"
They advertize AMD64, presumably because that's what there servers
use, but it should work on Int
I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on
Intel. If you know of any, please let me know. Linux may be an option, too.
btw, I'm getting a sustained rate of 135 queries/sec with 4 threads, which
is pretty impressive. Another way around the concurrency limit is to run
Hmmm, can you run the 64 bit version of Windows (and hence a 64 bit JVM?)
We're running with heap sizes up to 8GB (RH Linux 64 bit, Opterons,
Sun Java 1.5)
-Yonik
On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> Paul,
>
> I tried this but it ran out of memory trying to read the 500Mb .fdt fi
Ray,
The throughput is worse with NioFSDIrectory than with the FSDIrectory
(patched and unpatched). The bottleneck still seems to be synchronization,
this time in NioFile.getChannel (7 of the 8 threads were blocked there
during one snapshot). I tried this with 4 and 8 channels.
The throughput wi
Paul,
I tried this but it ran out of memory trying to read the 500Mb .fdt file. I
tried various values for MAX_BBUF, but it still ran out of memory (I'm using
-Xmx1600M, which is the jvm's maximum value (v1.5)) I'll give
NioFSDirectory a try.
Thanks,
Peter
On 1/26/06, Paul Elschot <[EMAIL PROT
Speaking of NioFSDirectory, I thought there was one posted a while
ago, is this something that can be used?
http://issues.apache.org/jira/browse/LUCENE-414
ray,
On 11/22/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Jay Booth wrote:
> > I had a similar problem with threading, the problem turned o
On Wednesday 25 January 2006 20:51, Peter Keegan wrote:
> The index is non-compound format and optimized. Yes, I did try
> MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors)
>
> Peter
>
You could also give this a try:
http://issues.apache.org/jira/browse/LUCENE-283
Regards
Yes, it's hyperthreaded (16 cpus show up in task manager - the box is
running 2003). I plan to turn off hyperthreading to see if it has any
effect.
Peter
On 1/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> > It's a 3GHz Intel box with Xeo
On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> It's a 3GHz Intel box with Xeon processors, 64GB ram :)
Nice!
Xeon processors are normally hyperthreaded. On a linux box, if you
cat /proc/cpuinfo, you will see 8 processors for a 4 physical CPU
system. Are you positive you have 8 physical X
The index is non-compound format and optimized. Yes, I did try
MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors)
Peter
On 1/25/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Peter Keegan wrote:
> > This is just fyi - in my stress tests on a 8-cpu box (that's 8 real
> cpus
It's a 3GHz Intel box with Xeon processors, 64GB ram :)
Peter
On 1/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> Thanks Peter, that's useful info.
>
> Just out of curiosity, what kind of box is this? what CPUs?
>
> -Yonik
>
> On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> > This is
Peter Keegan wrote:
This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus),
the maximum throughput occurred with just 4 query threads. The query
throughput decreased with fewer than 4 or greater than 4 query threads. The
entire index was most likely in the file system cache, t
Thanks Peter, that's useful info.
Just out of curiosity, what kind of box is this? what CPUs?
-Yonik
On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus),
> the maximum throughput occurred with just 4 query threads. The
This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus),
the maximum throughput occurred with just 4 query threads. The query
throughput decreased with fewer than 4 or greater than 4 query threads. The
entire index was most likely in the file system cache, too. Periodic
snapshots
Hi,
There are two sunchronization points: on the stream and on the reader. Using
different FSDirectoriy and IndexReaders should solve this. I'll let you know
once I code it. Right now I'm checking if making my Documents store less
data will move the bottleneck to some other place.
Thanks again,
O
Jay Booth wrote:
I had a similar problem with threading, the problem turned out to be that in
the back end of the FSDirectory class I believe it was, there was a
synchronized block on the actual RandomAccessFile resource when reading a
block of data from it... high-concurrency situations caused t
:[EMAIL PROTECTED]
> Sent: Monday, November 21, 2005 11:08 AM
> To: java-user@lucene.apache.org; [EMAIL PROTECTED]
> Subject: Re: Throughput doesn't increase when using more concurrent
> threads
>
>
> On 11/21/05, Oren Shir <[EMAIL PROTECTED]> wrote:
> > It is rather sa
ovember 21, 2005 11:08 AM
To: java-user@lucene.apache.org; [EMAIL PROTECTED]
Subject: Re: Throughput doesn't increase when using more concurrent
threads
On 11/21/05, Oren Shir <[EMAIL PROTECTED]> wrote:
> It is rather sad if 10 threads reach the CPU limit. I'll check it and g
On 11/21/05, Oren Shir <[EMAIL PROTECTED]> wrote:
> It is rather sad if 10 threads reach the CPU limit. I'll check it and get
> back to you.
It's about performance and throughput though, not about number of
threads it takes to reach saturation.
In a 2 CPU box, I would say that the ideal situation
gekkokid,
does 1.4.3 benefit from multi-threading?
>
Sorry for not being clear. My tests show that both version does not benefit
from multi threading, but it is possible that I'm CPU bound, as Yonik kindly
reminded me.
is 1.9 the version in the source repository?
1.9 is the version in source re
This is expected behavior: you are probably quickly becoming CPU bound
(which isn't a bad thing). More threads only help when some threads
are waiting on IO, or if you actually have a lot of CPUs in the box.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 11/21/05, Oren Shir <[EMAIL
Oren Shir wrote:
I tested this in version 1.4.3 and 1.9rc1, and they are both the same in
this aspect. 1.9rc1 is faster, but does not benefit from multi threading.
some newbie questions i have,
does 1.4.3 benefit from multi-threading?
is 1.9 the version in the source repository?
_gk
---
61 matches
Mail list logo