Delete Document from Index. How?

2010-11-10 Thread dian puma
Hi All, I'm struggling with Lucene on deleting a specific document from the index. I've read the book Lucene in Action to see how to do it. There are 2 ways to delete documents from index, using IndexWriter.deleteDocuments(term) OR IndexReader.deleteDocuments. CMIIW FYI, I use PHP/Java Bridge and

Re: API access to in-memory tii file (3.x not flex).

2010-11-10 Thread Jason Rutherglen
Yeah that's customizing the Lucene source. :) I should have gone into more detail, I will next time. On Wed, Nov 10, 2010 at 2:10 PM, Michael McCandless wrote: > Actually, the .tii file pre-flex (3.x) is nearly identical to the .tis > file, just that it only contains every 128th term. > > If you

Non matched terms

2010-11-10 Thread Brian C. Dilley
Hi, I'm using Lucene for a search project and I have the following requirements and I was wondering if one of you fine folks could point me in the right direction (currently i'm using the RAMDirectory, IndexSearcher, StandardAnalyzer and QueryParser): Given the example search string: "red leather

Re: API access to in-memory tii file (3.x not flex).

2010-11-10 Thread Michael McCandless
Actually, the .tii file pre-flex (3.x) is nearly identical to the .tis file, just that it only contains every 128th term. If you just make SegmentTermEnum public (or, sneak your class into oal.index package) then you can instantiate SegmentTermsEnum passing it an IndexInput opened on the .tii file

Re: API access to in-memory tii file (3.x not flex).

2010-11-10 Thread Jason Rutherglen
In a word, no. You'd need to customize the Lucene source to accomplish this. On Wed, Nov 10, 2010 at 1:02 PM, Burton-West, Tom wrote: > Hello all, > > We have an extremely large number of terms in our indexes.  I want to be able > to extract a sample of the terms, say something like every 128th

API access to in-memory tii file (3.x not flex).

2010-11-10 Thread Burton-West, Tom
Hello all, We have an extremely large number of terms in our indexes. I want to be able to extract a sample of the terms, say something like every 128th term. If I use code based on org.apache.lucene.misc.HighFreqTerms or org.apache.lucene.index.CheckIndex I would get a TermsEnum, call term

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
Ah exactly the kind of wake-up call that I was looking for! Thank You :) On Wed, Nov 10, 2010 at 3:01 PM, Steven A Rowe wrote: > NFS[1] != NTFS[2] > > [1] NFS: > [2] NTFS: > > > -Original Me

RE: IndexWriters and write locks

2010-11-10 Thread Uwe Schindler
Windows does not use NFS natively, it’s a network file system for UNIX O/S. But can you confirm that you are working on a local filesystem even on windows (NTFS?). Not something like samba shares? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetap

RE: IndexWriters and write locks

2010-11-10 Thread Steven A Rowe
NFS[1] != NTFS[2] [1] NFS: [2] NTFS: > -Original Message- > From: Pulkit Singhal [mailto:pulkitsing...@gmail.com] > Sent: Wednesday, November 10, 2010 2:55 PM > To: java-user@lucene.apach

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
You know that really confuses me. I've heard that stated a few times and every time I just felt that it couldn't possibly be right. Maybe it was meant in some very specific manner because otherwise aren't all Windows OSs off-limits to Lucene then? On Wed, Nov 10, 2010 at 2:40 PM, Uwe Schindler wr

RE: IndexWriters and write locks

2010-11-10 Thread Uwe Schindler
Are you using NFS as filesystem? NFS is incompatible to lucene :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Pulkit Singhal [mailto:pulkitsing...@gmail.com] > Sent: Wednesday, November 10, 2010 7:5

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
Thanks Uwe, that helps explain why the lock file is still there. The last piece of the puzzle is why someone may see exceptions such as the following from time to time: java.nio.channels.OverlappingFileLockException at sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.j

Re: Search returning documents matching a NOT range

2010-11-10 Thread Robert Muir
On Wed, Nov 10, 2010 at 1:04 PM, Uwe Schindler wrote: > I know where the bug is... > > The problem has nothing to to with MultiSearcher at all, its just the > rewritten query. Because (as Robert said) MultiSearcher rewrites per index, > the rewritten query is different for each sub-index. The pr

How to DeleteDocuments from Index?

2010-11-10 Thread dian puma
Hi All, I'm struggling with Lucene on deleting a specific document from the index. I've read the book Lucene in Action to see how to do it. There are 2 ways to delete documents from index, using IndexWriter.deleteDocuments(term) OR IndexReader.deleteDocuments. CMIIW FYI, I use PHP/Java Bridge and

RE: Search returning documents matching a NOT range

2010-11-10 Thread Uwe Schindler
I know where the bug is... The problem has nothing to to with MultiSearcher at all, its just the rewritten query. Because (as Robert said) MultiSearcher rewrites per index, the rewritten query is different for each sub-index. The problems, Robert mentioned only affect scoring (which is differen

RE: IndexWriters and write locks

2010-11-10 Thread Uwe Schindler
This is because Lucene uses Native Filesystem Locks. The lock file itself is just a placeholder which is not cleaned up on Ctrl-C. The lock is not the file itself, its *on* the file. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -O

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
I do not actually take the trouble to specify what Lock Factory to use, hmmm. Are you suggesting that because I'm using FSDirectory.open() in my code, I get a locking scheme that works ... while on other machine for other folks, they get one that runs into issues and throws java.nio.channels.Overl

Re: Search returning documents matching a NOT range

2010-11-10 Thread Robert Muir
On Wed, Nov 10, 2010 at 8:05 AM, Erick Erickson wrote: > Has anyone opened a JIRA on this? > I'm not sure if we should call it a bug or not (in this particular example, it would be nice if we could find a fix i think though). Because in general we can't guarantee MultiSearcher(Searcher...) will

Re: IndexWriters and write locks

2010-11-10 Thread Michael McCandless
Likely you are using NativeFSLockFactory? In which case, a leftover lock file does not mean the index is in fact locked, since the OS will [correctly] release the lock on process exit. Mike On Wed, Nov 10, 2010 at 9:38 AM, Pulkit Singhal wrote: > Hello, > > 1) On Windows, I often shut down my a

IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
Hello, 1) On Windows, I often shut down my application server (which has active IndexWriters open) using the ctrl+c keys. 2) I inspect my directories on the file system I see that the write.lock file is still there. 3) I start the app server again, and do some operations that would require IndexWr

Re: Implementing indexing of Versioned Document Collections

2010-11-10 Thread Pulkit Singhal
1) You can attach byte array "Payloads" for every occurrence of a term during indexing. It will be stored at each term position, during indexing, and then can be retrieved during searching. You may want to consider taking this approach rather than writing bitvectors to a text file. If you feel that

Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-11-10 Thread Thomas Rewig
Hello, please excuse that I hijack this old thread but I have the same problem with the deleted file handles, so I think this is the right place for. I also integrated the searchManager in our Code and see the file handles fluctuate up and down. At first glance the situation seems stable but

Re: Search returning documents matching a NOT range

2010-11-10 Thread Erick Erickson
Has anyone opened a JIRA on this? Erick On Wed, Nov 10, 2010 at 7:53 AM, Robert Muir wrote: > On Sun, Nov 7, 2010 at 11:32 PM, Uwe Schindler wrote: > > Does the same happen with a MultiReader on top of both indexes and using > a > > single IndexSearcher on top of this MultiReader? > > > > P.S.

Re: Search returning documents matching a NOT range

2010-11-10 Thread Robert Muir
On Mon, Nov 8, 2010 at 6:45 AM, Ian Lea wrote: > This does seem extremely odd.  David sent me a copy of his index and > I've played around with it and also written a self-contained RAM index > program, below, that shows the same problem, namely that if the second > index has 1000+ docs the one and

Re: Search returning documents matching a NOT range

2010-11-10 Thread Robert Muir
On Sun, Nov 7, 2010 at 11:32 PM, Uwe Schindler wrote: > Does the same happen with a MultiReader on top of both indexes and using a > single IndexSearcher on top of this MultiReader? > > P.S.: How about using NumericField? > should be no problem there, it always uses filter rewrite? the problem is

Re: Search returning documents matching a NOT range

2010-11-10 Thread Robert Muir
On Wed, Nov 10, 2010 at 7:00 AM, Robert Muir wrote: > On Mon, Nov 8, 2010 at 6:45 AM, Ian Lea wrote: >> This does seem extremely odd.  David sent me a copy of his index and >> I've played around with it and also written a self-contained RAM index >> program, below, that shows the same problem, na