Re: De-duping MultiSearcher results

2005-11-14 Thread Jason Calabrese
Maybe I'm missing something simple, but I don't see how this will work. It looks like this filter will just filter out documents that don't have guid field, but in my case every document has a guid. In a single index there are no duplicates. Duplicates are only a problem when I search multip

Re: How to add more stopwords to StandardAnalyzer

2005-11-14 Thread Supheakmungkol SARIN
Thanks a lot Erik!!! Regards, Mungkol --- Erik Hatcher wrote: On 14 Nov 2005, at 06:20, Supheakmungkol SARIN wrote: > That is the problem as I want to add more stopwords to > the default stopwords of StandardAnalyzer. No problem get the list, create a larger array and add yours to it. The S

Re: De-duping MultiSearcher results

2005-11-14 Thread Daniel Noll
Jason Calabrese wrote: All, In the project I'm working on we have a separate index for each database. There are 12 databases now. but in the future there may be as many as 20. They all have their own release cycle so I don't want to merge the indexes. The databases all have some overlap b

De-duping MultiSearcher results

2005-11-14 Thread Jason Calabrese
All, In the project I'm working on we have a separate index for each database. There are 12 databases now. but in the future there may be as many as 20. They all have their own release cycle so I don't want to merge the indexes. The databases all have some overlap between them. We manage thi

Re: Extract term and its frequency from the index and file?

2005-11-14 Thread MALCOLM CLARK
cheers

Re: Extract term and its frequency from the index and file?

2005-11-14 Thread Steven Rowe
MALCOLM CLARK wrote: Could you send me the url for HighFreqTerms.java in cvs? ViewCVS URL: - To uns

Re: Extract term and its frequency from the index and file?

2005-11-14 Thread MALCOLM CLARK
Hi, Could you send me the url for HighFreqTerms.java in cvs? Thanks, Malcolm

Re: Memory Usage

2005-11-14 Thread Marvin Humphrey
On Nov 14, 2005, at 9:19 AM, Doug Cutting wrote: It would actually not be too hard to change things so that there was such a parameter that could be set on an IndexReader. It would determine the fraction of entries in the .tii file that are kept in RAM. So if the parameter were, e.g., 10

Re: Memory Usage

2005-11-14 Thread Doug Cutting
Marvin Humphrey wrote: You *can't* set it on the reader end. If you could set it, the reader would get out of sync and break. The value is set per-segment at write time, and the reader has to be able to adapt on the fly. It would actually not be too hard to change things so that there was

Re: Memory Usage

2005-11-14 Thread Marvin Humphrey
On Nov 13, 2005, at 10:22 PM, Daniel Noll wrote: Okay, I've gone and revised how things are fitting together in our app. It seems that we already call optimize() at the end of all the processing, before which I could figure out what kind of value we should be using and call this setter m

Re: How to add more stopwords to StandardAnalyzer

2005-11-14 Thread Erik Hatcher
On 14 Nov 2005, at 06:20, Supheakmungkol SARIN wrote: That is the problem as I want to add more stopwords to the default stopwords of StandardAnalyzer. No problem get the list, create a larger array and add yours to it. The StandardAnalyzer uses this stop list by default: public

Re: How to add more stopwords to StandardAnalyzer

2005-11-14 Thread Erik Hatcher
On 14 Nov 2005, at 01:01, Supheakmungkol SARIN wrote: I'd like to add some other stopwords to the StandardAnalyzer. How do i do this? Have a look at the constructor of StandardAnalyzer that accepts a String[] of stop words. If you use that, you will replace the stop word list that is used

Re: About searching in multiple fields with one query

2005-11-14 Thread Karl Koch
Hi Jian, Are you sure of that? This would be quite a bad thing to do. I am refering to the paper by Robertson at al http://citeseer.ist.psu.edu/robertson04simple.html in which it is shown that summing up of multiple scores violates a number of basic assumptions in TF/IDF. Although it is shown on

Re: How to add more stopwords to StandardAnalyzer

2005-11-14 Thread Supheakmungkol SARIN
That is the problem as I want to add more stopwords to the default stopwords of StandardAnalyzer. Regards, Mungkol --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On 14 Nov 2005, at 01:01, Supheakmungkol SARIN > wrote: > > I'd like to add some other stopwords to the > > StandardAnalyzer. How d

FileNotFoundException while indexing data

2005-11-14 Thread amolb
Hi everybody, I am trying to index arround 10 lacs user records, but my indexing application is failing with following exception after around 5 lacs user records. If I rerun it fails to the same record. I tried to skip the record, but that does not help. I tried to get help from google, but few

RE: Performance Question

2005-11-14 Thread Mike Streeton
Thanks for this, I did not really explain my self well in the original question, what I was interested to know is would a single Searcher constructed from a MultiReader (across several different indexes) work better than a MultiSearcher constructed from IndexSearchers each pointing at a single inde

Re: Max score of two fields

2005-11-14 Thread Lasse L
I forgot to mention that I never have to search in the old_name alone. I either search in current_name alone or current_name OR old_name. Realizing that, lead me to the simple solution of duplicating whatever I put into a current field into the old fields too. So the info in the fields with old_ p