Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Jake Mannix
Note that in particular, we use the StandardTokenizer as part of our analyzer chain, which means it has the switch from the JavaCC version to the JFlex based code, which I'm betting is a substantial part of that speedup. -jake On Feb 3, 2008 2:11 PM, Briggs <[EMAIL PROTECTED]> wrote: > Damn, r

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Jake Mannix
The test in which we got the 11X speedup? That was single threaded. I haven't yet found a way to make multithreaded (shared IndexWriter) indexing perform with any better speed than singlethreaded, so that code is not enabled in our tests. Do you think that 2.3 would better take advantage of mult

Re: Concurrent Indexing + Searching

2008-02-03 Thread ajay_garg
@Mark. I am sorry, but I need a bit more of explanation. So you mean to say :: "If auto-commit is false, then of course, docs will not be visible in the index, until all the threads release themselves out of a particular IndexWriter instance, and close() the IndexWriter instance. If auto-commit

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread ajay_garg
Hi Jake. Was the test conducted with a single indexing thread, or multiple ones ? Jake Mannix wrote: > > Hello all, > I know you lucene devs did a lot of work on indexing performance in 2.3, > and I just tested it out last thursday, so I thought I'd let you know how > it > fared: > > On

Re: Query in Lucene 2.3.0

2008-02-03 Thread ajay_garg
Thanks Yonik for the clarifications, and for the prompt replies. Now, God forbidding, I should be fine, and shouldn't be losing my sleep :-) Thanks again to Yonik and Mike. Ajay Garg Yonik Seeley wrote: > > On Feb 3, 2008 11:44 AM, ajay_garg <[EMAIL PROTECTED]> > wrote: >> Firstly, in the 2.3

Boosting using an external data source

2008-02-03 Thread Michael Stoppelman
I've created a mapping of query terms to clusters with corresponding strength values that I want to integrate into lucene scoring so I can boost documents that match the clusters. I would like to give a boost based on the normalized score. In my setup, each document has a field with the clusters th

Re: How can I get document's top n raw score?

2008-02-03 Thread Grant Ingersoll
I'm not sure I understand what you are asking, but, you can get non- normalized scores by using the lower-level non-Hits based search like the TopDocs, etc. However, scores are not really all that comparable across queries. -Grant On Feb 1, 2008, at 6:46 AM, Lisa Lee wrote: I need know do

Re: Different levels of negative boosting

2008-02-03 Thread Grant Ingersoll
What are the other parts of your queries like? And why the need for the separate instantiations of the QueryParser? You might try something like: good^2 badA^0.1 badB^0.3 or some other bigger separation of the boost value between the good terms and the bad terms. The other thing to do

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Jake Mannix
Yeah, I should have mentioned - this was merely with a jar replacement, we haven't gotten around to doing fun 2.3-related stuff like making sure our domain-specific tokenizers use the next(Token), as well as making sure set all of our buffersizes by RAM used. We tried multithreading the process, a

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Briggs
Damn, really? I haven't had the opportunity to test this yet. Has anyone else seen this kind of improvement? On Feb 3, 2008 2:57 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > Hello all, > I know you lucene devs did a lot of work on indexing performance in 2.3, > and I just tested it out last

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Michael McCandless
Awesome! We are glad to hear that :) You might be able to make it even faster with the steps here: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Mike Jake Mannix wrote: Hello all, I know you lucene devs did a lot of work on indexing performance in 2.3, and I just tested i

Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Jake Mannix
Hello all, I know you lucene devs did a lot of work on indexing performance in 2.3, and I just tested it out last thursday, so I thought I'd let you know how it fared: On a 2.17 million document index, a recent test gave indexing time to be: * lucene 2.2: 4.83 hours * lucene 2.3: 26 m

Re: Concurrent Indexing + Searching

2008-02-03 Thread Mark Miller
You are correct that autocommit=false means that docs will be in the index before the last thread releases its concurrent hold on a Writer, *but because IndexAccessor controls* *when the IndexSearchers are reopened*, those docs will still not be visible until the last thread holding a Writer re

Re: Query in Lucene 2.3.0

2008-02-03 Thread Yonik Seeley
On Feb 3, 2008 11:44 AM, ajay_garg <[EMAIL PROTECTED]> wrote: > Firstly, in the 2.3 optimizations, point 4 says :: > " 4. LUCENE-959: Remove synchronization in Document (yonik)". > > Well, what does that mean, since it has already been assured that multiple > adds, deletes, updates CAN be done by m

Re: Concurrent Indexing + Searching

2008-02-03 Thread ajay_garg
Hi. Sorry if I seem a stranger in this thread, but there is something that I can't resist clearing myself on. Mark, you say that the additional documents added to a index, won't show up until the # of threads accessing the index hits 0; and subsequently the indexwriter instance is closed. But I

Re: Query in Lucene 2.3.0

2008-02-03 Thread ajay_garg
Thanks again Mike. In fact, I have just finished going throught the CHANGE.TXT file, that mentions the entire journey details of Lucene, right from 1.4 to 2.3. And of course, got to know many more things. Just a couple of issues more. Firstly, in the 2.3 optimizations, point 4 says :: " 4. LUCE