I don't believe our large users to have enough memory for Lucene
indexes to fit in RAM. (Especially given we use quite a bit of RAM
for other stuff.) I think we also close readers pretty frequently
(whenever any user updates a JIRA issue, which I am assuming
happening nearly constantly
Otis Gospodnetic wrote:
Our current default behaviour is a merge factor of 4. We perform an
optimization
on the index every 4000 additions. We also perform an optimize at
midnight. Our
I wouldn't optimize every 4000 additions - you are killing IO,
rewriting the whole index, while trying
Justus Pendleton wrote:
On 05/11/2008, at 4:36 AM, Michael McCandless wrote:
If possible, you should try to use a larger corpus (eg Wikipedia)
rather than multiply Reuters by N, which creates unnatural term
frequency distribution.
I'll replicate the tests with the wikipedia corpus over th
Tomer Gabel wrote:
Since you're using an 8-core Mac Pro
I also assume you have some sort of RAID setup, which means your
storage
subsystem can physically handle more than one concurrent request,
which can
only come into play with multiple segments.
This is an important point: a multi-seg
On Wed, Nov 5, 2008 at 9:47 AM, Tomer Gabel <[EMAIL PROTECTED]> wrote:
> 1. Higher merge factor => more segments.
Right, and it's also important to note that it's only "on average"
more segments.
The number of segments go up and down with merging, so at particular
points in time, an index with a h
-
--
http://www.tomergabel.com Tomer Gabel
--
View this message in context:
http://www.nabble.com/Performance-of-never-optimizing-tp20296914p20343051.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---
On 05/11/2008, at 4:36 AM, Michael McCandless wrote:
If possible, you should try to use a larger corpus (eg Wikipedia)
rather than multiply Reuters by N, which creates unnatural term
frequency distribution.
I'll replicate the tests with the wikipedia corpus over the next few
days and regen
If possible, you should try to use a larger corpus (eg Wikipedia)
rather than multiply Reuters by N, which creates unnatural term
frequency distribution.
The graphs are hard to read because of the spline interpolation.
Maybe you could overlay X's where there is a real datapoint?
After
On Mon, 2008-11-03 at 23:37 +0100, Justus Pendleton wrote:
> What constitutes a "proper warm up before measuring"?
The simplest way is to do a number of searches before you start
measuring. The first searches are always very slow, compared to later
searches.
If you look at http://wiki.statsbiblio
Been a while since I've been in the benchmark stuff, so I am going to
take some time to look at this when I get a chance, but off the cuff I
think you are open and closing the reader for each search. Try using the
openreader task before the 100 searches and then the closereader task.
That will
On 03/11/2008, at 11:07 PM, Mark Miller wrote:
Am I missing your benchmark algorithm somewhere? We need it.
Something doesn't make sense.
I thought I had included in at[1] before but apparently not, my
apologies for that. I have updated that wiki page. I'll also reproduce
it here:
{ "Ro
On Mon, 2008-11-03 at 04:42 +0100, Justus Pendleton wrote:
> 1. Why does the merge factor of 4 appear to be faster than the merge
> factor of 2?
Because you alternate between updating the index and searching? With 4
segments, chances are that most of the segment-data will be unchanged
between sear
Am I missing your benchmark algorithm somewhere? We need it. Something
doesn't make sense.
- Mark
Justus Pendleton wrote:
Howdy,
I have a couple of questions regarding some Lucene benchmarking and
what the results mean[3]. (Skip to the numbered list at the end if you
don't want to read the
Hello Justus, Chris and Otis,
IIRC Ocean [1] by Jason Rutherglen addresses the issue for real time
searches on large data sets. A conceptually comparable implementation is
done for Jackrabbit, where you can see an enlighting picture over here
[2]. In short:
1) IndexReaders are opened only once
Hi, Justus,
I had met with very similar problems as JIRA has, which has high
modification and on a large data volume. It's a pretty common use case
for Lucene.
The way I dealt with high rate of modification is to create a secondary
in-memory index. And only persist documents older than a per
On 03/11/2008, at 4:27 PM, Otis Gospodnetic wrote:
Why are you optimizing? Trying to make the search faster? I would
try to avoid optimizing during high usage periods.
I assume that the original, long-ago, decision to optimize was made to
improve searching performance.
One thing that you
Hello,
Very quick comments.
- Original Message
> From: Justus Pendleton <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Sunday, November 2, 2008 10:42:52 PM
> Subject: Performance of never optimizing
>
> Howdy,
>
> I have a couple of q
Howdy,
I have a couple of questions regarding some Lucene benchmarking and
what the results mean[3]. (Skip to the numbered list at the end if you
don't want to read the lengthy exegesis :)
I'm a developer for JIRA[1]. We are currently trying to get a better
understanding of Lucene, and ou
18 matches
Mail list logo