Hi Daniel,
JapaneseAnalyzer, which is the most popular analyzer in Japan I believe, is
there.
JapaneseAnalyzer:
https://sen.dev.java.net/files/documents/1373/35812/lucene-ja-2.0test2.zip
ASL v2.0 applies to the releases of JapaneseAnalyzer.
JapaneseAnalyzer is not large program but it uses Sen t
Hi,
I have a stream-based document parser that extracts contents (as a character
stream) as well as document metadata (as strings) from a file, in a single
pass. From these data I want to create a Lucene document. The problem is
that the metadata are available not until the complete document has
I forgot a couple of things.
I do not think that all your object properties belongs to the Index, and
some of them will be put in the index with information degradation (ie
store year/month rather than the whole date). So I do not believe there
is a bidirectional relationship between your domai
I can answer a small part of your question... Doc IDs have nothing to do
with scoring. Each time you index a document, it get a doc id greater than
any already in the index, and they get reassigned if you delete docs and
optimize They *may* be used when scoring to break ties but that doesn't
Thomas:
There are some rather extensive threads on this list about the "interesting"
issues that exist when indexing/searching other languages. I think you'd
find it worthwhile to search the list archive for foreign language or some
such...
The short answer as I remember is that there *is* a bui
Hi,
I am not really familiar with Compass I haven't really looked at the
code, Hibernate Lucene (now renamed Hibernate Search) started from a
user demand. I had some in depth discussions though, with some users
that evaluated both Compass and Hibernate Search that helped me drive
its design.
Hi,
does anybody know a (more or less) ready-to-use free Japanese analyzer? I
know I can use CJKAnalyzer but I need one that puts only real words into
the index (no just n-grams). There seem to be a lot of papers on the Web
and there's also "Juma", but I'm looking for a Java-based solution.
Re
Hi Chris,
You are right !!!
Here is the explain output:
- DOC 222-home-
40960.0 = fieldWeight(WORD:home in 0), product of:
1.0 = tf(termFreq(WORD:home)=1)
1.0 = idf(docFreq=2)
40960.0 = fieldNorm(field=WORD, doc=0)
- DOC 111-home-
40960.0 = fieldWeight(WORD:home in 1), pro
Hi there,
I'm fairly new to lucene, I just developped a multi threaded indexing
tcp server using lucene to hmmm, let me remember, index stuffs :)
I have to index not only english, but french and german, and, I don't
know, perhaps other languages in the future.
Did lucene use a default stemmer