Lucene does NOT use UTF-8.

2005-08-26 Thread Marvin Humphrey
Greets, [crossposted to java-user@lucene.apache.org and [EMAIL PROTECTED] I've delved into the matter of Lucene and UTF-8 a little further, and I am discouraged by what I believe I've uncovered. Lucene should not be advertising that it uses "standard UTF-8" -- or even UTF-8 at all, since "

Standard or Modified UTF-8?

2005-08-26 Thread Marvin Humphrey
Greets, As part of my attempt to speed up Plucene and establishing index compatibility between Plucene and Java Lucene, I'm porting InputStream and OutputStream to XS (the C API for accessing Perl's guts), and I believe I have found a documentation bug in the file- format spec at... http

Re: Books about Lucene?

2005-08-26 Thread Otis Gospodnetic
Erik already answer that, but let me emphasize that this is not only allowed, but encouraged. I believe anyone can add information to the Lucene Wiki, so if you have information that you would like to share and that may help others, please do it. Otis --- jian chen <[EMAIL PROTECTED]> wrote: >

Re: Books about Lucene?

2005-08-26 Thread Otis Gospodnetic
Hi, > If the demand for a 1.2-compatible version of Lucene is enough that > there some folks willing to develop it and maintain it, I would be > happy to have it within Lucene's own codebase. I think keeping as > much of the code that can be identical as possible is important, and > > if

Re: Books about Lucene?

2005-08-26 Thread jian chen
Hi, Erik, Thanks. I think I will dig out my changes against Lucene 1.2 and then let you know what in detail they are. It will take some days though. Thanks, Jian On 8/26/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > It's absolutely permissible. Lucene is licensed with the Apache > Software

index files in jar file

2005-08-26 Thread Thomas Lepkowski
Hello, I have a set of index files that I'd like to distribute with my Java application. The only way this seems practical is to place the index files in a jar file. I tries this, but the search choked when I told IndexSearcher the index path inside the jar file ( and placed the jar file in the

Re: Books about Lucene?

2005-08-26 Thread Erik Hatcher
It's absolutely permissible. Lucene is licensed with the Apache Software License, which is quite liberal with what you can do with the code. If the demand for a 1.2-compatible version of Lucene is enough that there some folks willing to develop it and maintain it, I would be happy to hav

Re: Should I use span query?

2005-08-26 Thread Erik Hatcher
On Aug 26, 2005, at 4:11 PM, Andrew Boyd wrote: Hi All, I'm trying to find all the terms that are within x number of terms of given query terms. Should I be using span query or something else. If you have any code samples I would greatly appreciated it. PhraseQuery, or "termA termB"~1

Should I use span query?

2005-08-26 Thread Andrew Boyd
Hi All, I'm trying to find all the terms that are within x number of terms of given query terms. Should I be using span query or something else. If you have any code samples I would greatly appreciated it. Thanks, Andrew -

Re: Books about Lucene?

2005-08-26 Thread jian chen
Hi, Erik, I some time ago played with the Lucene 1.2 source code and made some modifications to it, trying to add my own ranking algorithm. I am not sure if Licence wise, it is permissible to modify the earlier source code, also if it is allowed to put the modified version or the description of

Re: Does order of BooleanQuery clauses affect search performance?

2005-08-26 Thread Paul Elschot
On Friday 26 August 2005 17:58, [EMAIL PROTECTED] wrote: > > A simple question and I guess it may have been asked before. > > Does the order of Querys in a BooleanQuery affect search speed? By this I > mean if the first clause of a BooleanQuery only returns a few results and > the second clause r

Re: Books about Lucene?

2005-08-26 Thread Erik Hatcher
I appreciate the vote of confidence on this, but I am not afraid to admit that I do not consider myself an expert on the deep innards of Lucene. I understand the concepts, and a bit of the internals, but I certainly do not live up to the hype you just bestowed upon me. *blush* Regarding J

Solved (Re: Document visible by Term, but not search)

2005-08-26 Thread Andrzej Bialecki
Hi list, This is just to let you know that I found the reason (Dan sent me a small sample index off-list), and I thought that the reason for this error was obscure and tricky enough that you might be interested in the solution. The problem lied in custom boost values. It was impossible to fi

Re: Thinking about better highlighting

2005-08-26 Thread mark harwood
> Am I right that the MemoryIndex with getReader() is > not available > anywhere at this point? createReader() is the method you need. I think the latest SVN version has it. ___ To help you stay safe and secure online,

RE: Does order of BooleanQuery clauses affect search performance?

2005-08-26 Thread Mordo, Aviran (EXP N-NANNATEK)
As far as I remember the order of Queries in a BooleanQuery does not affect performance. (but I may be wrong) Aviran http://www.aviransplace.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, August 26, 2005 11:59 AM To: java-user@lucene.apache.org Sub

Re: Lucene in IR Research

2005-08-26 Thread Dave Kor
Quoting Karl Koch <[EMAIL PROTECTED]>: > Hello all, > > I would like to know about papers that where written and used Lucene as the > unerlying search engine. E.g. Lucene as baseline search engine and some > modifications to compare it with baseline Lucene system etc. > > Please provide links to p

Does order of BooleanQuery clauses affect search performance?

2005-08-26 Thread Paul . Illingworth
A simple question and I guess it may have been asked before. Does the order of Querys in a BooleanQuery affect search speed? By this I mean if the first clause of a BooleanQuery only returns a few results and the second clause returns lots of results and the two are ANDed is this faster than t

Re: Lucene in IR Research

2005-08-26 Thread Otis Gospodnetic
Karl, A good place to start would be a list of Doug Cuttin's papers. We have a list of them in Appendix C, section C.9 of Lucene in Action, but the list is also available online (don't have the URL, but I'm sure Google has it indexed). Otis --- Karl Koch <[EMAIL PROTECTED]> wrote: > Hello all

Re: Thinking about better highlighting

2005-08-26 Thread Fred Toth
Thanks Mark for your pointers. I'm deep into this, trying to wire something up. Am I right that the MemoryIndex with getReader() is not available anywhere at this point? Thanks, Fred At 11:53 AM 8/25/2005, mark harwood wrote: >> but I'm still lost on how to convert > everything to SpanQuery >

Lucene in IR Research

2005-08-26 Thread Karl Koch
Hello all, I would like to know about papers that where written and used Lucene as the unerlying search engine. E.g. Lucene as baseline search engine and some modifications to compare it with baseline Lucene system etc. Please provide links to published papers if possible. Kind regards, Karl --

Re: Books about Lucene?

2005-08-26 Thread Karl Koch
Hello Otis, I do agree with Otis that somebody, preferably Erik, would provide a more detailed list of reasons why 1.3 does not run on Java 1.2. This list could then be used by others to adapt the code of version 1.3+ in order to make it run for their individual purposes (if this is necessary).

Re: Thinking about better highlighting

2005-08-26 Thread Marvin Humphrey
On Aug 24, 2005, at 7:47 PM, Fred Toth wrote: However, after reviewing recent discussions about highlighting, and struggling with our own highlighting issues, I'm wondering if there's a better way. Here's one way. This is the algo used by a developer's version of my Perl/C search engine libr