Re: Lucene as a primary datastore

2010-01-22 Thread Aaron McCurry
While I know that our situation is fairly unique, but we rebuild our indexes weekly. The source of our indexes are data marts generated from flat files. We do this because our data changes too rapidly for us to keep up with the changes. We do update the indexes at runtime, but only with about

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-01-22 Thread Michael McCandless
It looks like each IW writes to a private log file -- could you zip up all such files and attach as a zip file (CC me directly because the list strips attachments)? It can give me a bigger picture than just these fragments... Mike On Fri, Jan 22, 2010 at 12:02 PM, Frank Geary wrote: > > Mike, >

Re: Problem: Indexing and searching repeating groups of fields

2010-01-22 Thread TJ Kolev
The issue is that the real world document has more than 2 fields. Me giving an example of two was a bit misleading. I can't really "pair" them. Here's a better example: Resume_a Exp_1 Language:Java, Years:5, Certification:Sun, Area:Web Exp_2 Language:C, Years:3, Certification:None, Are

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-01-22 Thread Frank Geary
Mike, Below are the portions of the merge log files related to the setInfoStream() calls during the time our exception happens. The ..._1... log file is from one RAMDir index and the ..._0... log file is from the other RAMDir index. The time period when our Array index out of bounds exception h

Re: Applying LUCENE-1606 -- which version

2010-01-22 Thread Robert Muir
what is a 'contains clause' ??? what is the use case behind it, and why can't poker be tokenized from the text so they just type 'poker' with no wildcards? On Fri, Jan 22, 2010 at 8:41 AM, Sriram Muthuswamy Chittathoor wrote: > Okay the only reason I asked was to support contains clause.  But it

RE: Applying LUCENE-1606 -- which version

2010-01-22 Thread Sriram Muthuswamy Chittathoor
Okay the only reason I asked was to support contains clause. But it could be relaxed if it is causing a issue and we could support those .{0,3}poker.* fast. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, January 22, 2010 7:01 PM To: java-user@lucene.ap

Re: Applying LUCENE-1606 -- which version

2010-01-22 Thread Robert Muir
the first step if determining why you really need to support *poker* on 100M database rows, is it some programmers fault, or are users directly typing this in? Either can probably re-trained, perhaps you should do like google, and drop these characters from users queries if they do this... On Fri

RE: Applying LUCENE-1606 -- which version

2010-01-22 Thread Sriram Muthuswamy Chittathoor
then I recommend storing your data in a different structure that will support such queries. -- what would this structure be ? Is this to be done through Lucene. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, January 22, 2010 6:47 PM To: java-user@lucene.apa

Re: Applying LUCENE-1606 -- which version

2010-01-22 Thread Robert Muir
the patch won't especially help queries like this. maybe they will be 5 or 6 times faster, but still slow. if you really want to do such queries, consider this: do you really need * to match an INFINITE amount of characters?!?! If not, then consider rewriting input queries like this into FINITE R

Can tf() field access the field it is being used for ?

2010-01-22 Thread Paul Taylor
Hi, Im trying to override the Similarity lengthNorm() and tf() methods, but I only want to override for particular index fields, lengthNorm() is fine but tf() doesn't provide the fieldname as a parameter, so Im a bit stuck - is there anyway round this. Here is my code, which doesnt compile bec

Re: Index Growing on Delete

2010-01-22 Thread Ian Lea
You could also try swapping the order of your commit and expunge calls. The javadocs for expungeDeletes says: Note that this call does not first commit any buffered documents, so you must do so yourself if necessary. -- Ian. On Fri, Jan 22, 2010 at 10:04 AM, anshum.gu...@naukri.com wrote: > H

RE: Applying LUCENE-1606 -- which version

2010-01-22 Thread Sriram Muthuswamy Chittathoor
Thanks for the fast reply. http://svn.apache.org/repos/asf/lucene/java/branches/flex_1458/ I used the one above to checkout. Hope it is fine. Will see how it works out. My requirement is that I have a index into a DB row (100 million rows). On some text fields I want to do contains search fa

RE: Applying LUCENE-1606 -- which version

2010-01-22 Thread Uwe Schindler
Hi Sriram, This patch cannot be applied to 3.0 as it depends on a new Lucene branch called flex. It depends on features, only added in version 3.1. So it does not even apply on trunk, you have to checkout the experimental flex branch first. For 3.0 you, may try one of the early patches (the las

Re: Index Growing on Delete

2010-01-22 Thread anshum.gu...@naukri.com
Hi Jamie, You could try and debug using IndexReader.numDeletedDocs() ; On the index to check if the documents are expunged or not. --Original Message-- From: Jamie To: java-user@lucene.apache.org ReplyTo: java-user@lucene.apache.org Subject: Index Growing on Delete Sent: Jan 22, 2010 14:04

Applying LUCENE-1606 -- which version

2010-01-22 Thread Sriram Muthuswamy Chittathoor
Hi: I am trying to apply this Automata patch on my Lucene 3.0 src code but running into issues as it is complaining about failures to apply patch to certain files. Is this the right version To apply to. Please help Thanks Sriram C LUCENE-1606 https://issues.apache.org/jira/browse

Index Growing on Delete

2010-01-22 Thread Jamie
Hi In our application,on a periodic basis, documents get deleted from the index. Although the deleted documents correctly cannot be found when searching the index, our users are complaining that their hard drive is fulling up, since the index continues to grow in size despite the fact that th

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-01-22 Thread Michael McCandless
On Thu, Jan 21, 2010 at 4:46 PM, Frank Geary wrote: > Nope.  When it's time to inactivate a RAMDir indexWriter, I get that > directory,  close that writer, then clear out the directory.  Then after > clearing out the directory, I create a new IW passing in the directory that > was used previously