Re: Poor memory performance over a large index

2005-06-16 Thread Chris Hostetter
: Strange, but opening a second IndexSearcher and running another query : consumes another 560M of RAM. : In our case the results are always sorted by some column(s). : The app is supposed to be a multithreaded multiuser environment. : At the begining the design was that each user session has its o

Re: dangers of fieldcache

2005-06-16 Thread Chris Hostetter
I don't have a commiter-bit, so i can't say for a fact that a patch like this would/wouldn't be accepted, but it seems like a pretty simple change... 1) add a "public void suggestRemove(IndexReader r)" method to FieldCache.java 2) add an implimentation of that method to FieldCacheImpl.java whi

Re: Determining the IDF while searching for documents

2005-06-16 Thread Chris Hostetter
: ...to get the number of docs that contain a specific term, you can use : IndexReader.docFreq(Term) : That's not possible, because my indexelements are trigrams, so with : IndexReader.docFreq(trigram) I get the number of docs which contain this : trigram, but I need the number of docs that contai

Re: Need a way to set a result limit on a particular field

2005-06-16 Thread Jay Hill
Thanks Richard, I'll check it out. -Jay On 6/16/05, Richard Krenek <[EMAIL PROTECTED]> wrote: > To add to this option, you may want to use this patch > http://issues.apache.org/bugzilla/show_bug.cgi?id=27743 > This way instead of pulling the entire document back each time, just > pull back your h

Re: Search Hit frequency and location

2005-06-16 Thread Paul Elschot
On Thursday 16 June 2005 21:03, Sean O'Connor wrote: > Thanks for the clarification. I had assumed that to be the case, but > assumptions have a tendency to come back and bite me in inappropriate > places. By pointing that out, you've probably saved me from beating my > head against the wall in

Re: Search Hit frequency and location

2005-06-16 Thread Erik Hatcher
On Jun 16, 2005, at 3:03 PM, Sean O'Connor wrote: The big stumbling block I have at the moment is understanding whether Terms can be used to find something like a phrase query, proximity query, or boolean query. I think the answer is no, two different concepts. Terms are the heart of sear

Re: Search Hit frequency and location

2005-06-16 Thread Sean O'Connor
Thanks for the clarification. I had assumed that to be the case, but assumptions have a tendency to come back and bite me in inappropriate places. By pointing that out, you've probably saved me from beating my head against the wall in the near future : -). The big stumbling block I have at the

Getting the "directory/location" of an IndexReader/IndexWriter

2005-06-16 Thread Robichaud, Jean-Philippe
Hi Everyone. I'm currently in a situation where I have multiples indexSearcher opened at the same, each on different indices. They are kept inside a "IndicesManager" that export getSearcherAtLocation/FreeSearcher method. I would like to be able to log the "path" used by a searcher I'm about to "c

Re: QueryParser, phrases and stopwords

2005-06-16 Thread Mike Barry
Daniel Naber wrote: >On Thursday 16 June 2005 04:17, Erik Hatcher wrote: > > > >>So we could change StopFilter to put the gaps back in safely now, I >>think. >> >>Thoughts? >> >> > >I personally don't have a problem with this, but shouldn't such a change be >optional? Like a parameter for

Re: QueryParser, phrases and stopwords

2005-06-16 Thread Erik Hatcher
On Jun 16, 2005, at 2:03 PM, Daniel Naber wrote: On Thursday 16 June 2005 04:17, Erik Hatcher wrote: So we could change StopFilter to put the gaps back in safely now, I think. Thoughts? I personally don't have a problem with this, but shouldn't such a change be optional? Like a parame

Re: QueryParser, phrases and stopwords

2005-06-16 Thread Daniel Naber
On Thursday 16 June 2005 04:17, Erik Hatcher wrote: > So we could change StopFilter to put the gaps back in safely now, I   > think. > > Thoughts? I personally don't have a problem with this, but shouldn't such a change be optional? Like a parameter for StopFilter or a StopGapFilter? I'm sure t

Re: Performance with multi index

2005-06-16 Thread Volodymyr Bychkoviak
can you measure "pure" index creation time (without creating XMLs) and one more question: do you keep your indexWriter open all the time during process? JM Tinghir wrote: Well, it just took 145 minutes to index 2670 files (450 MB) in one index (29 MB). It only took 33 minutes when I did it int

Re: Search Hit frequency and location

2005-06-16 Thread Erik Hatcher
On Jun 16, 2005, at 12:03 PM, Sean O'Connor wrote: Yes, see the Javadoc for IndexReader.termPositions(). I'm probably missing the obvious here, but I assume this refers to the analyzed terms (i.e. individual words, possibly transmogrified by the analyzer). Just to respond to part of your m

Re: Need a way to set a result limit on a particular field

2005-06-16 Thread Richard Krenek
To add to this option, you may want to use this patch http://issues.apache.org/bugzilla/show_bug.cgi?id=27743 This way instead of pulling the entire document back each time, just pull back your host field. Then do your check and only pull pack the rest of the document if you need to. This will help

Search Hit frequency and location

2005-06-16 Thread Sean O'Connor
Hello, I am trying to find the right approach for finding frequency (and, slightly lower in priority, location) of search hits in a document. I am working through the online documentation and the helpful "Lucene in Action" book. There are several examples and explanations which seem close, but

Re: Performance with multi index

2005-06-16 Thread Chris Collins
I contest to the value of increasing the minMergeDocs.it directly effects how much IO gets performed in indexing. Splitting it into multiple indices (if you want to pay the price of complexity), may well increase your throughput. Assuming you are not utilizing all of the resources the sys

Grouping search results

2005-06-16 Thread Diego Manilla Suárez
Hi! I'm trying to group the search results, just like when Google shows sub-results within the same domain than the main result. In my case, I need to index contents and their attached files. The ideal behaviour would be that, if there is a match in one of the associated files, the main result

Re: Performance with multi index

2005-06-16 Thread Volodymyr Bychkoviak
my previous message lost somewhere :( reposting can you measure "pure" index creation time (without creating XMLs) and one more question: do you keep your indexWriter open all the time during process? best way to determine bottlenecks is profiling :) regards, Volodymyr Bychkoviak JM Tinghir w

Re: Performance with multi index

2005-06-16 Thread Paul . Illingworth
I guess that if you have 10 indexes each with a merge factor of 10 with documents evenly distributed across those indexes then on average there will be a merge every 100 documents. If you have a single index there will be a merge every 10 documents. If you increase your merge factor from 10

Re: Performance with multi index

2005-06-16 Thread JM Tinghir
> Well, it just took 145 minutes to index 2670 files (450 MB) in one > index (29 MB). > It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB). Forgot to add, that it does not only indexes files, it also creates XML documents. So don't worry if takes 30 minutes to index 450

Re: Performance with multi index

2005-06-16 Thread Volodymyr Bychkoviak
JM Tinghir wrote: Could you qualify a bit more about what is slow? Well, it just took 145 minutes to index 2670 files (450 MB) in one index (29 MB). It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB). I think it took so much time, because it's merged too

Re: Performance with multi index

2005-06-16 Thread JM Tinghir
> Could you qualify a bit more about what is slow? Well, it just took 145 minutes to index 2670 files (450 MB) in one index (29 MB). It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB). > Perhaps you need to optimize the index? Perhaps, never tried it... JM --

Re: QueryParser, phrases and stopwords

2005-06-16 Thread Erik Hatcher
Are there any other issues or concerns with making this change to StopFilter? Should we make this change in 1.9? Or wait until after 2.0 is released? Mike - if you could create some test cases for this scenario and contribute your patch and tests to Bugzilla, barring no objections, I'll

Poor memory performance over a large index

2005-06-16 Thread Stanislav Jordanov
We are in a similar situatuin. The index contains about 1,000,000 docs and its total size is 31G (note: Gigabytes, not Megabytes). The problem is not the search speed - it is the memory usage. Opening the first IndexSearcher and running a query consumes about 325M of RAM Strange, but opening a

Re: QueryParser, phrases and stopwords

2005-06-16 Thread Mike Barry
Erik, Thanks, I applied the changes found in version 150148 of StopFilter.java and they work great for me. I did remove the setting of position=1 before the return of the token since that seemed spurious to me. Here's a context diff of the current StopFilter.java and my changes: *** analysis/S

Re: Performance with multi index

2005-06-16 Thread Erik Hatcher
On Jun 16, 2005, at 4:08 AM, JM Tinghir wrote: I have a 25 Mb index and was wondering if it would be better to divide it in about 10 indexes and search in it with MutliSearcher. Would searching be faster this way? The indexing would be faster I guess, as it is getting slower and slower while ind

dangers of fieldcache

2005-06-16 Thread John Wang
Hi: I am having some problems with the field cache. My application keeps a reader in memory for a amount of time and then polls for a new reader, at which point a new reader is loaded into memory and the older reader is then closed. However, after the reader is loaded, I do some operati

Performance with multi index

2005-06-16 Thread JM Tinghir
Hi, I have a 25 Mb index and was wondering if it would be better to divide it in about 10 indexes and search in it with MutliSearcher. Would searching be faster this way? The indexing would be faster I guess, as it is getting slower and slower while indexes get bigger. But searching? Jean-Marie