Dreaded optimize (again!)

2006-12-04 Thread Stanislav Jordanov
Guys, there's another aspect of the index optimize operation, that confuses us a lot - the free disk space it requires to complete successfully. Initially we thought that an amount of free disk space equal to the index size (prior to optimization) should suffice. Then it became clear that havin

an alternative to optimize?

2006-12-01 Thread Stanislav Jordanov
Guys, I've already asked this question but nobody answered: Suppose we have a relatively big index which is continuously updated - i.e. new docs get added while some of the old docs get deleted. For pragmatic reasons we have a restriction on maxMergeDocs so that segment files don't get enormou

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-28 Thread Stanislav Jordanov
Paul, we are using a slightly modified version of Lucene, so in order to run the performance tests on a nightly build, I need Lucene's sources, not the compiled classes. Is there a nice and easy way to get them? Stanislav Stanislav Jordanov wrote: Paul, We are working on delivering the

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-22 Thread Stanislav Jordanov
Paul Elschot On Tuesday 21 November 2006 17:59, Yonik Seeley wrote: On 11/21/06, Stanislav Jordanov <[EMAIL PROTECTED]> wrote: Switch to the old scorer (via BooleanQuery.setUseScorer14(true) ) solved the performance issue - now Lucene 1.9.1 & 2.0.0 perform on the same load tes

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Stanislav Jordanov
d and new boolean scorers? Cheers, Stenly Yonik Seeley wrote: On 11/21/06, Stanislav Jordanov <[EMAIL PROTECTED]> wrote: We've identified a significant querying performance decrease after switching from Lucene 1.4.3 to 1.9.1. It is steadily demonstrated no mater if the concurrent querying

Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Stanislav Jordanov
Hi guys, We've identified a significant querying performance decrease after switching from Lucene 1.4.3 to 1.9.1. It is steadily demonstrated no mater if the concurrent querying threads are 1, 2, 4 or 8 (or even more) - If N queries are executed against 1.9.1 for a given time, then 1.4.3 execu

Re: Putting some constraints on index optimization

2006-10-29 Thread Stanislav Jordanov
missing something? Regards, Stanislav Mike Klaas wrote: On 10/27/06, Stanislav Jordanov <[EMAIL PROTECTED]> wrote: Have the following problem with (explicitly invoked) index optimization - it seems to always merge all existing index segments into a single huge segment, which is undesirable

Putting some constraints on index optimization

2006-10-27 Thread Stanislav Jordanov
[Note: I am reposting this question, as I posted it yesterday and yet it hasn't appear on the mail list] Have the following problem with (explicitly invoked) index optimization - it seems to always merge all existing index segments into a single huge segment, which is undesirable in my case. Is

Putting some constraints on index optimization

2006-10-26 Thread Stanislav Jordanov
Have the following problem with (explicitly invoked) index optimization - it seems to always merge all existing index segments into a single huge segment, which is undesirable in my case. Is there a way to force index optimization to honor the IndexWriter.MAX_MERGE_DOCS setting? Stanislav --

threadsafe QueryParser?

2006-10-09 Thread Stanislav Jordanov
Method static public Query parse(String query, String field, Analyzer analyzer) in class QueryParser is deprecated in 1.9.1 and the suggestion is: /"Use an instance of QueryParser and the [EMAIL PROTECTED] #parse(String)} method instead."/ My question is: in the context of multi threaded app, is

obtaining the number of documents stored in a .cfs file

2006-09-05 Thread Stanislav Jordanov
Suppose I have a bunch of valid .cfs files while the segmens/segments.new file is missing or invalid. The task is to 'recover' the present .cfs files into a valid index. I think it will be necessary and sufficient to create a segments file that references the .cfs files. The only problem I've en

Re: Reviving a dead index

2006-08-30 Thread Stanislav Jordanov
I missed something that may be very important: I find it really strange, that the exception log reads: java.io.FileNotFoundException: F:\Indexes\index1\_16f6.fnm (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method)

Re: Reviving a dead index

2006-08-30 Thread Stanislav Jordanov
le, but I guess it lists the _1j8s Given these scarce resources, can you give me some further advise about what has happened and what can be done to prevent it from happening again? Regards, Stanislav Stanislav Jordanov wrote: What might be the possible reason for an IndexReader failing to o

segments' size and getMaxMergeDocs()

2006-08-30 Thread Stanislav Jordanov
If IndexWriter.getMaxMergeDocs() always returns M then which one is true: 1) No segment file will ever contain > M documents; 2) Any segment that participates in a merge contains <= M documents (but the resulting segment of the merge may contain > M documents) Obviously (1) implies (2) but my g

Reviving a dead index

2006-08-29 Thread Stanislav Jordanov
What might be the possible reason for an IndexReader failing to open properly, because it can not find a .fnm file that is expected to be there: java.io.FileNotFoundException: E:\index4\_1j8s.fnm (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method)

Split an existing index into smaller segments without a re-index?

2006-08-16 Thread Stanislav Jordanov
I searched the mail list archives for an answer to that question; The closest (and perhaps the only) thread in this regard that I found is: http://www.gossamer-threads.com/lists/lucene/java-user/9928 So the answer was "No", but this is way back in the mid 2004 (2 years ago). Is there a solution

QueryParser issue

2005-08-19 Thread Stanislav Jordanov
Recently I found that it makes difference whether your query looks like: A AND NOT B or it is A AND (NOT B) Some research using Luke showed that in the first case the query is rewritten as: +A -B while the second is rewritten as +A +(-B) after the usual "wtf?" I've recalled that there was a dicu

Re: The best way to know when an index has been changed

2005-08-03 Thread Stanislav Jordanov
Hi Steve, we have a similar situation and we choosed the following solution: The process that modifies the index (the writer) notifies the process that searches the index (the reader) In our case the notification is a specificly named subdirectory "index_modified" of the directory containing th

hot swapping searchers

2005-08-03 Thread Stanislav Jordanov
A Hits object holds a reference to a Searcher and uses it to retrieve docs not in the cache list. Is it ok if I modify the Hits object in a way that it may have its searcher replaced, i.e. introduce a: *setSearcher(Searcher s) { this.searcher = s; }* method and invoke it (syncronously) between i

Poor memory performance over a large index

2005-06-16 Thread Stanislav Jordanov
We are in a similar situatuin. The index contains about 1,000,000 docs and its total size is 31G (note: Gigabytes, not Megabytes). The problem is not the search speed - it is the memory usage. Opening the first IndexSearcher and running a query consumes about 325M of RAM Strange, but opening a

Re: OutOfMemory when indexing

2005-06-14 Thread Stanislav Jordanov
On 6/13/05, Stanislav Jordanov <[EMAIL PROTECTED]> wrote: High guys, Building some huge index (about 500,000 docs totaling to 10megs of plain text) we've run into the following problem: Most of the time the IndexWriter process consumes a fairly small amount of memory (about 32 megs).

Re: OutOfMemory when indexing

2005-06-14 Thread Stanislav Jordanov
Gusenbauer Stefan wrote: A few weeks before I had a similar problem too. I will write my problem and the solution for it: I'm indexing docs and every parsed document is stored in an ArrayList. This solution worked for little directories with a little number of files in it but when the things ar

Out of Memory (correction)

2005-06-13 Thread Stanislav Jordanov
A small correction to my last letter: "1000gigs" should be "1000 megs" (sorry) Here's the corrected version: High guys, Building some huge index (about 500,000 docs totaling to 10megs of plain text) we've run into the following problem: Most of the time the IndexWriter process consumes a fairly

OutOfMemory when indexing

2005-06-13 Thread Stanislav Jordanov
High guys, Building some huge index (about 500,000 docs totaling to 10megs of plain text) we've run into the following problem: Most of the time the IndexWriter process consumes a fairly small amount of memory (about 32 megs). However, as the index size grows, the memory usage sporadically burst

a "real" PhrasePrefixQuery

2005-05-20 Thread Stanislav Jordanov
Is there a Lucene Query (or something that will do a job) like: "Star Wars tri*" that will match all docs containing a 3 word phrase: 'Star' followed by 'Wars' followed by a word starting with 'tri'. I.e. the above query will match both "Star Wars trilogy" and "Star Wars triumph". (I know about

a *match all* query

2005-05-09 Thread Stanislav Jordanov
I need a query that will hit all documents in the index. How do I get one? 10x StJ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fast access to a random page of the search results.

2005-03-02 Thread Stanislav Jordanov
Thank you guys, there's a good chance that I will have the management persuaded to drop the 'random access requirement'. As you surely know, the management (usually) tends to be franticly optimistic. True to this trend, our management suggested us (the R&D team) that: "... it is time to assume th

How exactly is 'Lucene' pronounced?

2005-03-02 Thread Stanislav Jordanov
How exactly is 'Lucene' pronounced? Some of my collegues pronounce it like "Liu-sin" (accent on the second syllable) I use to pronounce like "Lu-sen" (accent on the second syllable) How's the right way to do it? - To unsubscribe

Re: Fast access to a random page of the search results.

2005-03-02 Thread Stanislav Jordanov
oug Cutting" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Tuesday, March 01, 2005 8:15 PM Subject: Re: Fast access to a random page of the search results. > Stanislav Jordanov wrote: > > startTs = System.currentTimeMillis(); >