Highlighter ArrayIndexOutOfBoundsException - W/Fix

2005-06-01 Thread Andrew Boyd
Hi, I'm getting an ArrayIndexOutOfBoundsException within the highlighter: java.lang.ArrayIndexOutOfBoundsException: 50 at org.apache.lucene.search.highlight.TokenGroup.addToken(TokenGroup.java:47) at org.apache.lucene.search.highlight.Highlighter.getBestDocFragments(Highlighter.java:

Re: Adding to the termFreqVector

2005-06-01 Thread Ryan Skow
Lucene's scalability is not in question. The simple solution of rebuilding the string of terms is what I referred to as not being scalable. For instance, consider the following term vector: termFreqVector (freq {myTermField: red/69, green/79, blue/899}) Recreating a string with 69

Re: Adding to the termFreqVector

2005-06-01 Thread Grant Ingersoll
I don't think you need to parse the toString, you have the TermFreqVector object which lets you access the appropriate pieces of information (string, freq). You could then turn around and delete/index the new document based on the vector with the increments. I don't know whether it would scale or

FieldCache and Sort

2005-06-01 Thread John Wang
Hi: In the current Lucene sorting implementation, FieldCache is used to retrieve 2 arrays, the lookup array and the order array. The order array at load time stores the position of the term in the lookup array. The lookup array is already sorted because it is read in from the index. My ques

Performance tuning and org.apache.lucene.store.InputStream.BUFFER_SIZE

2005-06-01 Thread Kevin Burton
I was doing a JProfiler install of our webapp/lucene last week and of course a large part of our app is spent in RandomAccessFile.readBytes ... This is called by InputStream.readByte which internally uses a BUFFER_SIZE of 1024 (which is the default). This value seems too small for a default

Re: SpanTermQuery issue?

2005-06-01 Thread yahootintin-lucene
Hi Erik, Here is the bug report with the test case: http://issues.apache.org/bugzilla/show_bug.cgi?id=35157 The scoring algorithm doesn't seem to work correctly when SpanTermQuerys are in a BooleanQuery. I will look for the problem. Any advice on what I should look for? Thanks, Reece --- Erik

Re: Clustering Carrot2 vs TermVector Analysis

2005-06-01 Thread Andrew Boyd
Responses inline prefixed with -Original Message- From: Dawid Weiss <[EMAIL PROTECTED]> Sent: Jun 1, 2005 3:24 AM To: java-user@lucene.apache.org Subject: Re: Clustering Carrot2 vs TermVector Analysis Hi Andrew, Coming up with an answer... sorry for the delay. > By using the carro

Re: Stemming at Query time

2005-06-01 Thread Shey Rab Pawo
If your stemmer worked on indexing, then won't the "breath" entry automatically pick up all of these? So, isn't the project unnecessary and otiose? On 5/31/05, Daniel Naber <[EMAIL PROTECTED]> wrote: > On Monday 30 May 2005 18:54, Andrew Boyd wrote: > > > Now that the QueryParser knows about pos

Re: SpanTermQuery issue?

2005-06-01 Thread Erik Hatcher
On May 31, 2005, at 8:38 PM, Reece Wilton wrote: Hi, Using a BooleanQuery to combine two SpanTermQuery objects causes unexpected results on Lucene 1.9 RC1. Is this a problem that is already known about or has already been fixed? I have a test case and more info if this is a new issue. Inte

SpanTermQuery issue?

2005-06-01 Thread Reece Wilton
Hi, Using a BooleanQuery to combine two SpanTermQuery objects causes unexpected results on Lucene 1.9 RC1. Is this a problem that is already known about or has already been fixed? I have a test case and more info if this is a new issue. Thanks. ---

Re: Ability to load a document with ONLY a few fields for performance?

2005-06-01 Thread Kevin Burton
Andrew Boyd wrote: The numbers look impressive. If I build from the 1.9 trunck will I get the patch? Funny... I went ahead and imoplemented this myself and it didn't work. Of course I may have implemented it incorrectly. I'll look at the patch source and try it out! Something fun to

Re: Clustering Carrot2 vs TermVector Analysis

2005-06-01 Thread Dawid Weiss
Hi Andrew, Coming up with an answer... sorry for the delay. By using the carrot demo: http://www.newsarch.com/archive/mailinglist/jakarta/lucene/user/msg03928.html I was able to easliy cluster search results based on the fields used by carrot( url, title, and summary). However I was wonderi

Re: Indexing multiple languages

2005-06-01 Thread Paul Libbrecht
Le 1 juin 05, à 01:12, Erik Hatcher a écrit : 1/ one index for all languages 2/ one index for all languages, with an extra language field so searches can be constrained to a particular language 3/ separate indices for each language? I would vote for option #2 as it gives the most flexibilty - y