Re: Modifying a document by updating a payloads?

2008-07-30 Thread Antony Bowesman
Hi Mike, Unfortunately you will have to delete the old doc, then reindex a new doc, in order to change any payloads in the document's Tokens. This issue: https://issues.apache.org/jira/browse/LUCENE-1231 which is still in progress, could make updating stored (but not indexed) fields a m

RE: Index optimization ...

2008-07-30 Thread Chris Hostetter
: My understanding is that an optimized index gives the best search there is an inherent inconsistency in your question -- yo usay you optimize your index before using it becuase you heard thta makes searches faster, but in your orriginal question you said... > I'd like to shorten the time it

RE: Index optimization ...

2008-07-30 Thread Dragon Fly
I'll run some tests. Thank you. > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Index optimization ... > Date: Wed, 30 Jul 2008 11:12:28 -0400 > > What version of Lucene are you using? What is your current > mergeFactor? Lowering this (minimum is 2) will result in

Re: Using lucene as a database... good idea or bad idea?

2008-07-30 Thread Jason Rutherglen
A possible open source solution using a page based database would be to store the documents in http://jdbm.sourceforge.net/ which offers BTree, Hash, and raw page based access. One would use a primary key type of persistent ID to lookup the document data from JDBM. Would be a good Lucene project

Re: Using lucene as a database... good idea or bad idea?

2008-07-30 Thread Marcelo Ochoa
Hi John: Did you test/know Lucene Domain Index for Oracle database? http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html If you are using Oracle 10g/11g is completed integrated in Oracle memory space like Oracle Text but based in Lucene. No network round trip i

Re: Bug in Sun's 1.6 hotspot compiler that can cause index corruption

2008-07-30 Thread Michael McCandless
You're welcome! I'm glad it saved you future problems. Mike Chris Lu wrote: Thanks!!! This would really save us a lot of efforts! -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.c

Re: storing the contents of a document in the lucene index

2008-07-30 Thread Erick Erickson
I thought of one more thing you should be aware of. The the default field length for any field (no matter which of the two forms you use) is 10,000 tokens. This can be easily changed, see IndexWriter.setMaxFieldLength(). Best Erick On Thu, Jul 24, 2008 at 9:25 AM, starz10de <[EMAIL PROTECTED]> w

Re: Bug in Sun's 1.6 hotspot compiler that can cause index corruption

2008-07-30 Thread Chris Lu
Thanks!!! This would really save us a lot of efforts! -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Creat

Bug in Sun's 1.6 hotspot compiler that can cause index corruption

2008-07-30 Thread Michael McCandless
FYI -- there is a nasty bug that affects Lucene in Sun's 1.6 hotspot compiler, starting with 1.6.0_04. At least 3 known cases have been seen on this list. Details are here: https://issues.apache.org/jira/browse/LUCENE-1282 The bug causes silent index corruption during merging, such that the

Re: Using lucene as a database... good idea or bad idea?

2008-07-30 Thread John Evans
Hi All, Thanks for all of the feedback. Largely as a result of the responses I've received from the mailing list, Lucene has made it's way on to our short list of possible solutions. I'm not sure what the timeframe is for implementing a prototype and testing it, but I will try to report back wit

RE: too many clause exception when using a filter

2008-07-30 Thread Steven A Rowe
Hi René, Since you're constructing the filter from a WildcardQuery or a PrefixQuery, both of which use a BooleanQuery to hold a TermQuery for each matching index term, you'll need to increase the number of clauses a BooleanQuery is allowed to hold, by calling static method BooleanQuery.setMaxCl

Re: Index optimization ...

2008-07-30 Thread Grant Ingersoll
What version of Lucene are you using? What is your current mergeFactor? Lowering this (minimum is 2) will result in an index that is closer to "optimal" since an optimized index is just one that has all the segments merged into a single segment and a mergeFactor of 2 just means there are

Re: Index optimization ...

2008-07-30 Thread Ian Lea
If I was you I'd certainly try cutting the optimize frequency. An optimized index should indeed give the best search performance, but in my experience it's generally plenty fast enough anyway, and I think you said earlier that you were prepared to sacrifice a bit of search or indexing speed. Sorr

Re: Index optimization ...

2008-07-30 Thread Anand Jain
As an aside, I would like to understand how do you get away without adding documents to the active index. As far as I understand, you are only adding docs to the inactive index and swap it with the active index (so the active one becomes inactive and vice-versa). So do you bring the "new" inacti

RE: Index optimization ...

2008-07-30 Thread Dragon Fly
My understanding is that an optimized index gives the best search performance. I can change my configuration to optimize the index every 24 hours. However, I still would like to know if there is a way to speed up optimization by tweaking parameters like the merge factor. > Date: Wed, 30 Jul 2

Re: Index optimization ...

2008-07-30 Thread Ian Lea
OK, but why do you need to optimize before every swap? Have you tried with less frequent optimizes? -- Ian. On Wed, Jul 30, 2008 at 3:00 PM, Dragon Fly <[EMAIL PROTECTED]> wrote: > I have two copies (active/inactive) of the index. Searches are executed > against the "active" index and new doc

RE: Index optimization ...

2008-07-30 Thread Dragon Fly
I have two copies (active/inactive) of the index. Searches are executed against the "active" index and new documents get added to the "inactive" copy. The two indexes get swapped every 4 hours (so that new documents are visible to the end user). Optimization is done before the inactive copy i

Re: Index optimization ...

2008-07-30 Thread Ian Lea
Why do you run an optimize every 4 hours? -- Ian. On Wed, Jul 30, 2008 at 2:46 PM, Dragon Fly <[EMAIL PROTECTED]> wrote: > Perhaps I didn't explain myself clearly so please let me try it again. I'm > happy with the search/indexing performance. However, my index gets fully > optimized every

Highlighting results returned from MultiFieldQueryParser

2008-07-30 Thread syedfa
Dear fellow Lucene/Java developers: I have an index created from an XML file which I am trying to search using the MultiFieldQueryParser. At present, I am using the QueryParser to successfully return results that are highlighted. The code is listed here: public List search(File indexDir, Strin

RE: Index optimization ...

2008-07-30 Thread Dragon Fly
Perhaps I didn't explain myself clearly so please let me try it again. I'm happy with the search/indexing performance. However, my index gets fully optimized every 4 hours and the time it takes to fully optimize the index is longer than I like. Is there anything that I can do to speed up the

Re: Modifying a document by updating a payloads?

2008-07-30 Thread Michael McCandless
Unfortunately you will have to delete the old doc, then reindex a new doc, in order to change any payloads in the document's Tokens. This issue: https://issues.apache.org/jira/browse/LUCENE-1231 which is still in progress, could make updating stored (but not indexed) fields a much low

Fieldable anyone?

2008-07-30 Thread Grant Ingersoll
Is there anyone out there that actually implements their own Fieldable instance? Just curious, as we are thinking of making some changes to it, but it would (very slightly) break our fairly strict back- compatibility rules (http://wiki.apache.org/lucene-java/BackwardsCompatibility ) so I wou

Modifying a document by updating a payloads?

2008-07-30 Thread Antony Bowesman
I seem to recall some discussion about updating a payload, but I can't find it. I was wondering if it were possible to use a payload to implement 'modify' of a Lucene document. For example, I have an ID field, which has a unique ID refering to an external DB. For example, I would like to stor

Re: CheckIndex possibly not detecting/fixing all corruptions?

2008-07-30 Thread Michael McCandless
Do you have the exception Luke produced? That'd be a good clue as to what CheckIndex is not detecting. It's hard for me to tell from that GDB trace exactly what's gone wrong... When you first ran CheckIndex, and it detected one corrupt segment, what exception did it report as the cause

CheckIndex possibly not detecting/fixing all corruptions?

2008-07-30 Thread John O'Brien
Hi, I already posted this question on the CLucene dev list but it was suggested that I may be able to get some help on the Java list so here goes. We use Clucene 0.9.20 in our search engine. One of the indexes appears to have become corrupt (still investigating the cause of the corruption).

Re: too many clause exception when using a filter

2008-07-30 Thread Zoeppi
Hi, when I use the PrefixQuery instead of the WildcardQuery, I still get the exception. Regards --René Original-Nachricht > Datum: Wed, 30 Jul 2008 14:03:28 +0530 > Von: "Ganesh - yahoo" <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: Re: too many clause exce

Re: too many clause exception when using a filter

2008-07-30 Thread Ganesh - yahoo
Hi, Try using PrefixQuery? Is it still throws exception? Regards Ganesh - Original Message - From: <[EMAIL PROTECTED]> To: Sent: Wednesday, July 30, 2008 1:00 PM Subject: too many clause exception when using a filter Hello, I've filled an index with 1100 text files with the names

too many clause exception when using a filter

2008-07-30 Thread Zoeppi
Hello, I've filled an index with 1100 text files with the names "monisys1" to "monisys1100". If I start a WildcardQuery WildcardQuery query = new WildcardQuery(new Term("fileId","monisys*")); Hits hits = searcher.search(query); I get a "Too many clauses" exception, like I expecte