Note that in particular, we use the StandardTokenizer as part of our
analyzer
chain, which means it has the switch from the JavaCC version to the JFlex
based
code, which I'm betting is a substantial part of that speedup.
-jake
On Feb 3, 2008 2:11 PM, Briggs <[EMAIL PROTECTED]> wrote:
> Damn, r
The test in which we got the 11X speedup? That was single threaded. I
haven't yet found a way to make multithreaded (shared IndexWriter) indexing
perform with any better speed than singlethreaded, so that code is not
enabled in our tests. Do you think that 2.3 would better take advantage of
mult
@Mark.
I am sorry, but I need a bit more of explanation. So you mean to say ::
"If auto-commit is false, then of course, docs will not be visible in the
index, until all the threads release themselves out of a particular
IndexWriter instance, and close() the IndexWriter instance.
If auto-commit
Hi Jake.
Was the test conducted with a single indexing thread, or multiple ones ?
Jake Mannix wrote:
>
> Hello all,
> I know you lucene devs did a lot of work on indexing performance in 2.3,
> and I just tested it out last thursday, so I thought I'd let you know how
> it
> fared:
>
> On
Thanks Yonik for the clarifications, and for the prompt replies. Now, God
forbidding, I should be fine, and shouldn't be losing my sleep :-)
Thanks again to Yonik and Mike.
Ajay Garg
Yonik Seeley wrote:
>
> On Feb 3, 2008 11:44 AM, ajay_garg <[EMAIL PROTECTED]>
> wrote:
>> Firstly, in the 2.3
I've created a mapping of query terms to clusters with corresponding
strength values that I want to integrate into lucene
scoring so I can boost documents that match the clusters. I would like to
give a boost based on the normalized score.
In my setup, each document has a field with the clusters th
I'm not sure I understand what you are asking, but, you can get non-
normalized scores by using the lower-level non-Hits based search like
the TopDocs, etc.
However, scores are not really all that comparable across queries.
-Grant
On Feb 1, 2008, at 6:46 AM, Lisa Lee wrote:
I need know do
What are the other parts of your queries like? And why the need for
the separate instantiations of the QueryParser?
You might try something like: good^2 badA^0.1 badB^0.3 or some other
bigger separation of the boost value between the good terms and the
bad terms.
The other thing to do
Yeah, I should have mentioned - this was merely with a jar replacement, we
haven't gotten around to doing fun 2.3-related stuff like making sure our
domain-specific tokenizers use the next(Token), as well as making sure set
all of our buffersizes by RAM used.
We tried multithreading the process, a
Damn, really? I haven't had the opportunity to test this yet. Has
anyone else seen this kind of improvement?
On Feb 3, 2008 2:57 PM, Jake Mannix <[EMAIL PROTECTED]> wrote:
> Hello all,
> I know you lucene devs did a lot of work on indexing performance in 2.3,
> and I just tested it out last
Awesome! We are glad to hear that :)
You might be able to make it even faster with the steps here:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
Mike
Jake Mannix wrote:
Hello all,
I know you lucene devs did a lot of work on indexing performance
in 2.3,
and I just tested i
Hello all,
I know you lucene devs did a lot of work on indexing performance in 2.3,
and I just tested it out last thursday, so I thought I'd let you know how it
fared:
On a 2.17 million document index, a recent test gave indexing time to be:
* lucene 2.2: 4.83 hours
* lucene 2.3: 26 m
You are correct that autocommit=false means that docs will be in the
index before the last thread releases its concurrent hold on a Writer,
*but because IndexAccessor controls* *when the IndexSearchers are
reopened*, those docs will still not be visible until the last thread
holding a Writer re
On Feb 3, 2008 11:44 AM, ajay_garg <[EMAIL PROTECTED]> wrote:
> Firstly, in the 2.3 optimizations, point 4 says ::
> " 4. LUCENE-959: Remove synchronization in Document (yonik)".
>
> Well, what does that mean, since it has already been assured that multiple
> adds, deletes, updates CAN be done by m
Hi. Sorry if I seem a stranger in this thread, but there is something that I
can't resist clearing myself on.
Mark, you say that the additional documents added to a index, won't show up
until the # of threads accessing the index hits 0; and subsequently the
indexwriter instance is closed.
But I
Thanks again Mike.
In fact, I have just finished going throught the CHANGE.TXT file, that
mentions the entire journey details of Lucene, right from 1.4 to 2.3. And of
course, got to know many more things.
Just a couple of issues more.
Firstly, in the 2.3 optimizations, point 4 says ::
" 4. LUCE
16 matches
Mail list logo