RE: Japanese word segmentation

2006-11-18 Thread Koji Sekiguchi
Hi Daniel, JapaneseAnalyzer, which is the most popular analyzer in Japan I believe, is there. JapaneseAnalyzer: https://sen.dev.java.net/files/documents/1373/35812/lucene-ja-2.0test2.zip ASL v2.0 applies to the releases of JapaneseAnalyzer. JapaneseAnalyzer is not large program but it uses Sen t

Add a document in a single pass?

2006-11-18 Thread alex
Hi, I have a stream-based document parser that extracts contents (as a character stream) as well as document metadata (as strings) from a file, in a single pass. From these data I want to create a Lucene document. The problem is that the metadata are available not until the complete document has

Re: Fwd: Hibernate Lucene trademark issues

2006-11-18 Thread Emmanuel Bernard
I forgot a couple of things. I do not think that all your object properties belongs to the Index, and some of them will be put in the index with information degradation (ie store year/month rather than the whole date). So I do not believe there is a bidirectional relationship between your domai

Re: Boost Document

2006-11-18 Thread Erick Erickson
I can answer a small part of your question... Doc IDs have nothing to do with scoring. Each time you index a document, it get a doc id greater than any already in the index, and they get reassigned if you delete docs and optimize They *may* be used when scoring to break ties but that doesn't

Re: stemmer

2006-11-18 Thread Erick Erickson
Thomas: There are some rather extensive threads on this list about the "interesting" issues that exist when indexing/searching other languages. I think you'd find it worthwhile to search the list archive for foreign language or some such... The short answer as I remember is that there *is* a bui

Re: Fwd: Hibernate Lucene trademark issues

2006-11-18 Thread Emmanuel Bernard
Hi, I am not really familiar with Compass I haven't really looked at the code, Hibernate Lucene (now renamed Hibernate Search) started from a user demand. I had some in depth discussions though, with some users that evaluated both Compass and Hibernate Search that helped me drive its design.

Japanese word segmentation

2006-11-18 Thread Daniel Naber
Hi, does anybody know a (more or less) ready-to-use free Japanese analyzer? I know I can use CJKAnalyzer but I need one that puts only real words into the index (no just n-grams). There seem to be a lot of papers on the Web and there's also "Juma", but I'm looking for a Java-based solution. Re

Re: Boost Document

2006-11-18 Thread John Pailet
Hi Chris, You are right !!! Here is the explain output: - DOC 222-home- 40960.0 = fieldWeight(WORD:home in 0), product of: 1.0 = tf(termFreq(WORD:home)=1) 1.0 = idf(docFreq=2) 40960.0 = fieldNorm(field=WORD, doc=0) - DOC 111-home- 40960.0 = fieldWeight(WORD:home in 1), pro

stemmer

2006-11-18 Thread Thomas Klein
Hi there, I'm fairly new to lucene, I just developped a multi threaded indexing tcp server using lucene to hmmm, let me remember, index stuffs :) I have to index not only english, but french and german, and, I don't know, perhaps other languages in the future. Did lucene use a default stemmer