: Strange, but opening a second IndexSearcher and running another query
: consumes another 560M of RAM.
: In our case the results are always sorted by some column(s).
: The app is supposed to be a multithreaded multiuser environment.
: At the begining the design was that each user session has its o
I don't have a commiter-bit, so i can't say for a fact that a patch like
this would/wouldn't be accepted, but it seems like a pretty simple
change...
1) add a "public void suggestRemove(IndexReader r)" method to
FieldCache.java
2) add an implimentation of that method to FieldCacheImpl.java whi
: ...to get the number of docs that contain a specific term, you can use
: IndexReader.docFreq(Term)
: That's not possible, because my indexelements are trigrams, so with
: IndexReader.docFreq(trigram) I get the number of docs which contain this
: trigram, but I need the number of docs that contai
Thanks Richard, I'll check it out.
-Jay
On 6/16/05, Richard Krenek <[EMAIL PROTECTED]> wrote:
> To add to this option, you may want to use this patch
> http://issues.apache.org/bugzilla/show_bug.cgi?id=27743
> This way instead of pulling the entire document back each time, just
> pull back your h
On Thursday 16 June 2005 21:03, Sean O'Connor wrote:
> Thanks for the clarification. I had assumed that to be the case, but
> assumptions have a tendency to come back and bite me in inappropriate
> places. By pointing that out, you've probably saved me from beating my
> head against the wall in
On Jun 16, 2005, at 3:03 PM, Sean O'Connor wrote:
The big stumbling block I have at the moment is understanding
whether Terms can be used to find something like a phrase query,
proximity query, or boolean query. I think the answer is no, two
different concepts.
Terms are the heart of sear
Thanks for the clarification. I had assumed that to be the case, but
assumptions have a tendency to come back and bite me in inappropriate
places. By pointing that out, you've probably saved me from beating my
head against the wall in the near future : -).
The big stumbling block I have at the
Hi Everyone.
I'm currently in a situation where I have multiples indexSearcher opened at
the same, each on different indices. They are kept inside a
"IndicesManager" that export getSearcherAtLocation/FreeSearcher method. I
would like to be able to log the "path" used by a searcher I'm about to
"c
Daniel Naber wrote:
>On Thursday 16 June 2005 04:17, Erik Hatcher wrote:
>
>
>
>>So we could change StopFilter to put the gaps back in safely now, I
>>think.
>>
>>Thoughts?
>>
>>
>
>I personally don't have a problem with this, but shouldn't such a change be
>optional? Like a parameter for
On Jun 16, 2005, at 2:03 PM, Daniel Naber wrote:
On Thursday 16 June 2005 04:17, Erik Hatcher wrote:
So we could change StopFilter to put the gaps back in safely now, I
think.
Thoughts?
I personally don't have a problem with this, but shouldn't such a
change be
optional? Like a parame
On Thursday 16 June 2005 04:17, Erik Hatcher wrote:
> So we could change StopFilter to put the gaps back in safely now, I
> think.
>
> Thoughts?
I personally don't have a problem with this, but shouldn't such a change be
optional? Like a parameter for StopFilter or a StopGapFilter? I'm sure
t
can you measure "pure" index creation time (without creating XMLs)
and one more question:
do you keep your indexWriter open all the time during process?
JM Tinghir wrote:
Well, it just took 145 minutes to index 2670 files (450 MB) in one
index (29 MB).
It only took 33 minutes when I did it int
On Jun 16, 2005, at 12:03 PM, Sean O'Connor wrote:
Yes, see the Javadoc for IndexReader.termPositions().
I'm probably missing the obvious here, but I assume this refers to
the analyzed terms (i.e. individual words, possibly transmogrified by
the analyzer).
Just to respond to part of your m
To add to this option, you may want to use this patch
http://issues.apache.org/bugzilla/show_bug.cgi?id=27743
This way instead of pulling the entire document back each time, just
pull back your host field. Then do your check and only pull pack the
rest of the document if you need to. This will help
Hello,
I am trying to find the right approach for finding frequency (and,
slightly lower in priority, location) of search hits in a document. I
am working through the online documentation and the helpful "Lucene in
Action" book. There are several examples and explanations which seem
close, but
I contest to the value of increasing the minMergeDocs.it directly effects
how much IO gets performed in indexing.
Splitting it into multiple indices (if you want to pay the price of
complexity), may well increase your throughput. Assuming you are not utilizing
all of the resources the sys
Hi! I'm trying to group the search results, just like when Google shows
sub-results within the same domain than the main result. In my case, I
need to index contents and their attached files. The ideal behaviour
would be that, if there is a match in one of the associated files, the
main result
my previous message lost somewhere :(
reposting
can you measure "pure" index creation time (without creating XMLs)
and one more question:
do you keep your indexWriter open all the time during process?
best way to determine bottlenecks is profiling :)
regards,
Volodymyr Bychkoviak
JM Tinghir w
I guess that if you have 10 indexes each with a merge factor of 10 with
documents evenly distributed across those indexes then on average there
will be a merge every 100 documents.
If you have a single index there will be a merge every 10 documents.
If you increase your merge factor from 10
> Well, it just took 145 minutes to index 2670 files (450 MB) in one
> index (29 MB).
> It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB).
Forgot to add, that it does not only indexes files, it also creates
XML documents. So don't worry if takes 30 minutes to index 450
JM Tinghir wrote:
Could you qualify a bit more about what is slow?
Well, it just took 145 minutes to index 2670 files (450 MB) in one
index (29 MB).
It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB).
I think it took so much time, because it's merged too
> Could you qualify a bit more about what is slow?
Well, it just took 145 minutes to index 2670 files (450 MB) in one
index (29 MB).
It only took 33 minutes when I did it into ~10 indexes (global size of 32 MB).
> Perhaps you need to optimize the index?
Perhaps, never tried it...
JM
--
Are there any other issues or concerns with making this change to
StopFilter? Should we make this change in 1.9? Or wait until after
2.0 is released?
Mike - if you could create some test cases for this scenario and
contribute your patch and tests to Bugzilla, barring no objections,
I'll
We are in a similar situatuin.
The index contains about 1,000,000 docs and its total size is 31G (note:
Gigabytes, not Megabytes).
The problem is not the search speed - it is the memory usage.
Opening the first IndexSearcher and running a query consumes about 325M
of RAM
Strange, but opening a
Erik,
Thanks, I applied the changes found in version 150148 of StopFilter.java
and they work great for me. I did remove the setting of position=1 before
the return of the token since that seemed spurious to me. Here's a context
diff of the current StopFilter.java and my changes:
*** analysis/S
On Jun 16, 2005, at 4:08 AM, JM Tinghir wrote:
I have a 25 Mb index and was wondering if it would be better to divide
it in about 10 indexes and search in it with MutliSearcher.
Would searching be faster this way?
The indexing would be faster I guess, as it is getting slower and
slower while ind
Hi:
I am having some problems with the field cache.
My application keeps a reader in memory for a amount of time and
then polls for a new reader, at which point a new reader is loaded
into memory and the older reader is then closed.
However, after the reader is loaded, I do some operati
Hi,
I have a 25 Mb index and was wondering if it would be better to divide
it in about 10 indexes and search in it with MutliSearcher.
Would searching be faster this way?
The indexing would be faster I guess, as it is getting slower and
slower while indexes get bigger.
But searching?
Jean-Marie
28 matches
Mail list logo