Storing Stemmed and Original Tokens

2007-01-22 Thread hannes
t would mean I would have to rewrite the query ... thanks hannes - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Storing Stemmed and Original Tokens

2007-01-22 Thread hannes
shows you how to store multiple tokens at the same offset in a document, and sounds like what you need. The basic idea is to use SetNextPositionIncrement(0) on the 2-nth tokens you want to wind up in the same position. At least that's my guess .. Best Erick On 1/22/07, hannes &l

Re: Lucene with Khmer ? (Language in cambodia)

2007-01-24 Thread hannes
the analyzed Tokens are correct! Thats the way I test my analyzers/tokenization/filtering without the overhead of indexing, search etc. Bests hannes Zsolt Czinkos schrieb: Hello >From the API: "public class StandardAnalyzer extends Analyzer Filters StandardTokenizer with Standa

Changing Term Vectors for Query

2021-06-06 Thread Hannes Lohr
Hello, for some Queries i need to calcuate the score mostly like the normal score, but for some documents certain terms are assigned a Frequency given by me and the score should be calculated with these new term frequencies. After some research, it seems i have to implement a custom Query, custo

Re: Lucene and Tomcat, too many open files

2006-03-16 Thread Hannes Carl Meyer
Hi Nick, use 'ulimit' on your ix system to check if its set to unlimited. check: http://wwwcgi.rdg.ac.uk:8081/cgi-bin/cgiwrap/wsi14/poplog/man/2/ulimit You don't have to set it to unlimited, maybe increasing the number will help. later Hannes Nick Atkins schrieb: Thank

Dealing with acronyms

2006-04-26 Thread Hannes Carl Meyer
asking the user to use case:"ABS" to search for acronyms Any experience with this kind of pattern? Other ideas or best practices? Thank you in advance and best regards Hannes - To unsubscribe, e-mail: [EMAIL PROTE

Re: Dealing with acronyms

2006-04-26 Thread Hannes Carl Meyer
Rajesh Munavalli schrieb: On 4/26/06, Hannes Carl Meyer <[EMAIL PROTECTED]> wrote: Hi All, I would like enable users to do an acronym search on my index. My idea is the following: 1.) Extract acronyms (ABS, ESP, VCG etc.) from the given document (which is going to be indexed)

Re: Efficiently paginating results.

2006-04-28 Thread Hannes Carl Meyer
ged during scrolling and that affects the query and you're re-querying, you will maybe get confused about new docs :-) Hannes Marc Dauncey schrieb: I read somewhere recently (maybe even on this list) a recommendation to requery each time for successive pages as this avoids some of the complex

Re: Efficiently paginating results.

2006-04-28 Thread Hannes Carl Meyer
the entire result set. --- Hannes Carl Meyer <[EMAIL PROTECTED]> wrote: Hi Marc, I'm using this method for a web-application. I'm storing only the current viewable set of documents in the session and re-query if the user scrolls to the next page. This method is prett

Re: creating indexReader object

2006-05-02 Thread Hannes Carl Meyer
Hi, IndexReader has some static methods, e.g. IndexReader reader = IndexReader.open(new File("/index")); http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#open(java.lang.String) Hannes trupti mulajkar schrieb: i am trying to create an object of in

Checking for duplicates inside index

2006-05-22 Thread Hannes Carl Meyer
ex, before indexing a new document I will compare the new documents checksum with the ones in the index. Is that a good idea? does someone have experiences with that method? any tools available? Thank you and kind regards Hannes --

Re: Checking for duplicates inside index

2006-05-24 Thread Hannes Carl Meyer
Ken Krugler schrieb: On Mon, 2006-05-22 at 23:42 +0200, Hannes Carl Meyer wrote: > I'm indexing ~1 documents per day but since I'm getting a lot of real duplicates (100% the same document content) I want to check the content before indexing... > My idea is to crea