Re: Lucene Index Encryption

2009-05-10 Thread Andrzej Bialecki
Babak Farhang wrote: If "position" is the only thing Lucene needs during writing, then that is good news: seeking backwards and overwriting what's already written--that would be difficult to implement. If Lucene employs a write once strategy for file I/O (w/ no exceptions), then we wont really

Re: Lucene Index Encryption

2009-05-10 Thread Babak Farhang
Seems to me this discussion is not necessarily limited to *encryption*: if you can implement encryption, you can also implement compression--which is perhaps interesting for archiving purposes (at access time, faster than unzipping an entire archived Directory and loading it, for example). >> Luce

Re: TermEnum with deleted dccuments

2009-05-10 Thread Antony Bowesman
Hi Mike, Thanks for the response. I looked at that issue, but my case is trivial to fix. I just keep the Set of terms I have deleted and ignore those during my second interation. Thanks Antony Michael McCandless wrote: This is known & expected. Lucene does not update the terms dictionar

Re: dbsight

2009-05-10 Thread Chris Lu
Hi, Mike, Thanks for your interest in DBSight! The free version of DBSight should satisfy most of the common requirements. We could open source it if time is right. There are many solutions for database and lucene, each has its own uniqueness. DBSight starts from basic SQL, and just apply a

Re: Deleted files considered for scoring

2009-05-10 Thread Yonik Seeley
On Sun, May 10, 2009 at 5:37 PM, Moshe Cohen wrote: > I am using Lucene 2.4.1 via Pylucene and have encountered the following > behavior: > When there are deleted documents in the index the search scores are > identical to those that exist had those documents not been deleted. > If I optimize the

Deleted files considered for scoring

2009-05-10 Thread Moshe Cohen
Hi, I am using Lucene 2.4.1 via Pylucene and have encountered the following behavior: When there are deleted documents in the index the search scores are identical to those that exist had those documents not been deleted. If I optimize the index and the deleted documents are actually removed, the t

RE: Distinct terms values? (like in Luke)

2009-05-10 Thread Uwe Schindler
> Don't mean to hijack this thread, but I have a related question: > > Is there also a way to filter the terms based on another field? > > For example, the documents might also contain the field "published > date", so I want to get a distinct list of values for the term > "religion" in documents

Re: Distinct terms values? (like in Luke)

2009-05-10 Thread Jeff Turner
Don't mean to hijack this thread, but I have a related question: Is there also a way to filter the terms based on another field? For example, the documents might also contain the field "published date", so I want to get a distinct list of values for the term "religion" in documents published

RE: Distinct terms values? (like in Luke)

2009-05-10 Thread Uwe Schindler
You can get this list using IndexReader.terms(new Term(fieldname,"")). This returns an enumeration of all terms starting with the given one (the field name). Just iterate over the TermEnum util the field name of the iterated term changes. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen ht

Distinct terms values? (like in Luke)

2009-05-10 Thread Ian Vink
I have tagged each of my documents with a term "religion" and values like "Baha'i, Christian, Jewish, Islam" etc. In Luke it shows me that I have a term count of 8 for the term "religion" How do I get a list of the 8 distinct values for the term religion from an index? Ian

Re: Boosting query - debuging

2009-05-10 Thread liat oren
Hi Grant, Thanks for the reply. I saw that I had a problem in the code that prints these (very stupid mistake) int docId = hits[j].doc; Document curDoc = searcher.doc(docId); and then to the explain method, I gave j instead of docId. But I have a questino regarding the fieldNorm - When I have 60