Re: Memory Usage

2005-11-13 Thread Daniel Noll
Chris Hostetter wrote: : I think though, that I will need a setter on the reader, rather than the : writer. That is, I don't know what factor we want until I know how : large the index is. And I don't know how large the index will be at the : time of creating the writer, but I can just ask for

Re: Extract term and its frequency from the index and file?

2005-11-13 Thread Otis Gospodnetic
Overall avg. freq.? No, but you should be able to calculate that yourself. Otis --- Supheakmungkol SARIN <[EMAIL PROTECTED]> wrote: > Thanks for your help. > > By the way does Lucene provide any API to retrieve the > average frequecy of a term in the index directly? My > goal is to compare the

How to add more stopwords to StandardAnalyzer

2005-11-13 Thread Supheakmungkol SARIN
Dear all, I'd like to add some other stopwords to the StandardAnalyzer. How do i do this? Thanks a lot in advance, Mungkol __ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com --

Re: Extract term and its frequency from the index and file?

2005-11-13 Thread Supheakmungkol SARIN
Thanks for your help. By the way does Lucene provide any API to retrieve the average frequecy of a term in the index directly? My goal is to compare the freq. of a term in a doc. with the average freq. of that term of all the indexed doc. in order to retrieve the good keywords. Regards, Mungkol

Re: Memory Usage

2005-11-13 Thread Marvin Humphrey
On Nov 13, 2005, at 6:27 PM, Chris Hostetter wrote: I believe if you really want to determine settings like this after building the index, you'll need to do an initial build the index using best guess values -- then if the calculations you do once the index is built aren't close enough to your

Re: Large Indexes

2005-11-13 Thread Charles Lloyd
On Nov 13, 2005, at 8:19 PM, Friedland, Zachary (EDS - Strategy) wrote: What is the largest lucene index that has been built? We're looking to build a sort of data warehouse that will hold transaction log files as long as possible. This index would grow at the rate of 10 million documents per m

Re: Large Indexes

2005-11-13 Thread Otis Gospodnetic
Largest index? Who knows! :) Lucene's internal limit is the size of the doc Id (max Integer). People typically roll their indices when they reach a certain size, but if you don't need your queries to be fast and always need all the data, then this may not make sense for you (well, it still may, a

Re: Extract term and its frequency from the index and file?

2005-11-13 Thread Otis Gospodnetic
Check out Lucene from CVS and look in the contrib/ directory: contrib/miscellaneous/src/java/org/apache/lucene/misc/HighFreqTerms.java Otis --- Supheakmungkol SARIN <[EMAIL PROTECTED]> wrote: > Dear all, > > I'd like to extract each term and its frequency in the > index and each file in order

Large Indexes

2005-11-13 Thread Friedland, Zachary (EDS - Strategy)
What is the largest lucene index that has been built? We're looking to build a sort of data warehouse that will hold transaction log files as long as possible. This index would grow at the rate of 10 million documents per month indefinitely. Is there a limit where lucene will fail? What should

Extract term and its frequency from the index and file?

2005-11-13 Thread Supheakmungkol SARIN
Dear all, I'd like to extract each term and its frequency in the index and each file in order to get the potential keywords of each file. Does Lucene provide any built-in method to do that? Thank you in advance, Mungkol _

Re: Memory Usage

2005-11-13 Thread Chris Hostetter
: I think though, that I will need a setter on the reader, rather than the : writer. That is, I don't know what factor we want until I know how : large the index is. And I don't know how large the index will be at the : time of creating the writer, but I can just ask for maxDoc() at the time : o

Re: Memory Usage

2005-11-13 Thread Marvin Humphrey
On Nov 13, 2005, at 6:11 PM, Daniel Noll wrote: Now, to figure out how to set it. There's no setter that I can see... then again it may be in trunk, and just not in the version we're stuck on for the time being. I haven't checked 1.4.3, but yes, I'm looking at the subversion trunk. It's

Re: Memory Usage

2005-11-13 Thread Daniel Noll
Marvin Humphrey wrote: You want indexInterval. Here's an excerpt from the docs in TermInfosWriter. Excellent, that looks like exactly what we're after. Now, to figure out how to set it. There's no setter that I can see... then again it may be in trunk, and just not in the version we're

Re: About searching in multiple fields with one query

2005-11-13 Thread jian chen
Hi, Karl, Looking at the Lucene 1.2 source code, looks to me that the MultiFieldQueryParser generates a BooleanQuery. Each sub-query with the BooleanQuery is for one field. The actually calculation of the scoring is with BooleanScorer.java, where the scores from each sub-query is accumulated. So,

About searching in multiple fields with one query

2005-11-13 Thread Karl Koch
Hello all, I have a question about searching within multiple fields. I have the following code for doing that (searchFields provides two fields in which I want to search): IndexSearcher searcher = new IndexSearcher(indexDirectory); // search over multiple index fields Query query = MultiFieldQuer

Re: Question on queryparser code from Lucene

2005-11-13 Thread Chris Hostetter
: Oh...ok. Where is this method created then, I can't seem to find it in : QueryParser? grep for "Query Query" -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: About Combining Scores

2005-11-13 Thread Karl Koch
Hello Sebastian, thank you for sharing your experience. I am happy that I am not the only person with this problem. I have read the previous paper by Robertson et al http://citeseer.ist.psu.edu/robertson04simple.html where he wrote about the danger of using combined scores and provided a solut

Re: Question on queryparser code from Lucene

2005-11-13 Thread Eugene Ezekiel
Oh...ok. Where is this method created then, I can't seem to find it in QueryParser? Thanks. -- Regards, Eugene Erik Hatcher wrote: :) Query(field) in this case is a method call. Erik - To unsubscribe, e-mail: [E

Re: Question on queryparser code from Lucene

2005-11-13 Thread Erik Hatcher
On 13 Nov 2005, at 13:39, Eugene Ezekiel wrote: I got this nagging problem that I can't figure out in the source code of Lucene. In the file org/apache/lucene/queryParser/QueryParser.java, there's a method called parse that returns a Query (see below): public Query parse(String query) thr

Question on queryparser code from Lucene

2005-11-13 Thread Eugene Ezekiel
I got this nagging problem that I can't figure out in the source code of Lucene. In the file org/apache/lucene/queryParser/QueryParser.java, there's a method called parse that returns a Query (see below): public Query parse(String query) throws ParseException { ReInit(new FastCharStream(n

Re: About Combining Scores

2005-11-13 Thread Sebastian Marius Kirsch
On Sun, Nov 13, 2005 at 12:04:41AM +0100, Karl Koch wrote: > My aim is to combine this two scores. The Lucenes score is normalisied > between 0.0 and 1.0 (if the score exceeded 1.0 at some point) or less then > 1.0 (if it did not). The user model looks the same in this perspective - > although base