Re: BTree

2006-01-12 Thread Kan Deng
Thanks, Yonik. TermInfosReader is exactly the class I am looking for. Kan --- Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 1/12/06, Kan Deng <[EMAIL PROTECTED]> wrote: > > Many thanks, Doug. > > > > A quick question, which class implements the > following > > logic? > > It looks to me like

Re: BTree

2006-01-12 Thread Yonik Seeley
On 1/12/06, Kan Deng <[EMAIL PROTECTED]> wrote: > Many thanks, Doug. > > A quick question, which class implements the following > logic? It looks to me like org.apache.lucene.index.TermInfosReader -Yonik - To unsubscribe, e-mail

Re: BTree

2006-01-12 Thread Kan Deng
Many thanks, Doug. A quick question, which class implements the following logic? org.apache.lucene.search.IndexSearcher? > For access, Lucene is equivalent to a B-Tree > with all but the leaves cached in memory, so > that accesses require only a single disk access. thanks, Kan --- Dou

Re: BTree

2006-01-12 Thread Doug Cutting
B-Tree's are best for random, incremental updates. They require log_b(N) disk accesses for inserts, deletes and accesses, where b is the number of entries per page, and N is the total number of entries in the tree. But that's too slow for text indexing. Rather Lucene uses a combination of fi

Re: Generating phrase queries from term queries

2006-01-12 Thread Chris Hostetter
: > (Assuming *I* understand it) what he's talking baout, is the ability for : > his search GUI to display suggested phrase searches you may want to try : > which consist of the words you just typed in grouped into phrases. : : Yes, that's precisely what I am talking about. Sorry for being unclear

Re: about the wordnet program.

2006-01-12 Thread Daniel Naber
On Donnerstag 12 Januar 2006 16:25, jason wrote: > When i incorporate these files,  Syns2Index.java, SynLookup.java, and > SynExpand.java, I find some variables are not defined. It depends on Lucene in SVN, some things in the Lucene API have changed since Lucene 1.4. So you need to get the lates

Re: BTree

2006-01-12 Thread Kan Deng
After reading into the source code, I think Lucene doeesn't use B+tree or other tree structure for index. A possible reason is that, since Lucene aims at handling gigabytes , it has to be cautious about the index file's size. B+tree may grow rapidly when the number of leaves grows. Hence, B+tre

Re: BTree

2006-01-12 Thread Daniel Naber
On Donnerstag 12 Januar 2006 05:47, shailesh kumar wrote: > I had   looked at the document you had listed as well as used a  Hex > editor to look at the segment files. .That is how I came to know about > the lexicographic sorting. But was not sure if BTree is used.  If I > understand correctly a B

Re: Cache index in RAMDirectory and evict

2006-01-12 Thread Kan Deng
John, thanks a lot for your excellent reply. Especially, I think this sentence is very convincing, > "Well, you _can_ be a lot better since you know what you're > doing. You can also be a _lot_ worse when you get it wrong. With such a high risk, probably I should try other tricks to improve t

Re: AW: Boolean Query

2006-01-12 Thread Doug Cutting
Klaus wrote: I have tried to study to lucene scoring in the default similarity. Can anyone explain me, how this similarity was designed? I have read a lot of IR literature, but I have never seen an equation like the one used in lucene. Why is this better then the normal cosine-measure? It degen

about the wordnet program.

2006-01-12 Thread jason
hi, i am trying to use the Lucene WordNet program for my application. However, i got some problems. When i incorporate these files, Syns2Index.java, SynLookup.java, and SynExpand.java, I find some variables are not defined. For instance, in Syns2Index. java, writer.setMergeFactor( writ

Re: Cache index in RAMDirectory and evict

2006-01-12 Thread John Haxby
Kan Deng wrote: 1. Performance. Since all the cached disk data resides outside JVM heap space, the access efficiency from Java object to those cached data cannot be too high. True, but you need to compare the relative speeds. If data has to be pulled from a file, then you're talking se

AW: Boolean Query

2006-01-12 Thread Klaus
Hi, I have tried to study to lucene scoring in the default similarity. Can anyone explain me, how this similarity was designed? I have read a lot of IR literature, but I have never seen an equation like the one used in lucene. Why is this better then the normal cosine-measure? Thanks, Klaus --

Re: How to check, whether Index is optimized or not?

2006-01-12 Thread Andrzej Bialecki
Otis Gospodnetic wrote: I don't think we have a public API for that, but the index is considered optimized when it contains only a single segment. Then, we could add the following to IndexReader: public boolean isOptimized() { return segmentInfos.size() == 1; } I think that should do it.

Re: How to check, whether Index is optimized or not?

2006-01-12 Thread Otis Gospodnetic
I don't think so. It's still a single segment. Close the reader, and you still have only one segment. You only have gaps from deleted docs, but I think that doesn't make the index unoptimized, even though optimizing such an index will remove the gaps. Otis - Original Message Fro

Re: How to check, whether Index is optimized or not?

2006-01-12 Thread Erik Hatcher
A fully optimized index has only a single segment. If you're using the non-compound index format you will be able to tell by looking at the segments file in the index where only one segment would be listed. There are certainly programatic ways of telling too, but I don't have that detail

Re: Generating phrase queries from term queries

2006-01-12 Thread Eric Jain
Chris Hostetter wrote: (Assuming *I* understand it) what he's talking baout, is the ability for his search GUI to display suggested phrase searches you may want to try which consist of the words you just typed in grouped into phrases. Yes, that's precisely what I am talking about. Sorry for bei

Re: How to check, whether Index is optimized or not?

2006-01-12 Thread Dave Kor
Do we need to check if any documents are marked for deletion? On 1/12/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > I don't think we have a public API for that, but the index is considered > optimized when it contains only a single segment. > Then, we could add the following to IndexReader: >

Re: BTree

2006-01-12 Thread Kan Deng
I have similar problem about the internal indexing data structure According to Paolo Ferragina of Univ Pisa, B+tree with cluster is best for sorting. However, referring to the implementation of org.apache.lucene.search.IndexSearch, it looks like the impl doesn't take B+tree, never mention cluster

Re: Cache index in RAMDirectory and evict

2006-01-12 Thread Kan Deng
Thanks, Otis. Also appreciate your wonderful book, "Lucene in Action". The book is so well written that it makes me very curious about the low level design of the system, in addition to how to use it. Back the cache problem, I agree that the native OS file system can do most of the job for me.