Re: How to get matched terms

2010-01-28 Thread Benjamin Heilbrunn
You could use Query.extractTerms(..) and then search for possible matches in the field term vector (requires stored TV). 2010/1/28 Vaijanath Rao : > Hi All, > > What is the simplest way of getting the matched terms of the query with > respect to the document. So for example let's say a document ha

Re: A interesting question (search by number of terms)

2010-01-21 Thread Benjamin Heilbrunn
Try BooleanQuery.setMinimumNumberShouldMatch 2010/1/21 Phan The Dai : > Hi everyone, I need you support with this question: > Assuming that I have some terms, such as: ("A", "B", "C", "D", "E") > How to search documents that contain a number of terms in that list > but do not care what terms are.

Re: TooManyClauses and maxClauseCount question

2010-01-20 Thread Benjamin Heilbrunn
Isn't maxClause count just a "best practice" limit to asure that performance doesn't decrease silently if big queries occur? Performance and memory consumption should depend on how many clauses are really used / number of matching documents I think that there is no (significant) difference in memor

Re: Field creation with TokenStream and stored value

2010-01-13 Thread Benjamin Heilbrunn
ee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -----Original Message- >> From: Benjamin Heilbrunn [mailto:ben...@gmail.com] >> Sent: Wednesday, January 13, 2010 2:31 PM >> To: java-user@lucene.apache.org >> Subject: Re: Field creation w

Re: Field creation with TokenStream and stored value

2010-01-13 Thread Benjamin Heilbrunn
Sorry for pushing this thing. Would it be possible to add the demanded constructor or would it break anything of lucenes logic? 2010/1/11 Benjamin Heilbrunn : > Hey out there, > > in lucene it's not possible to create a Field based on a TokenStream > AND supply a stored valu

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

2010-01-12 Thread Benjamin Heilbrunn
This is because matches in short fields (few terms) als typically more pregnant, than matches in long fields (much terms). Imagine the case with two fields named "title" and "content" representing the title and the content of books. If you match three search terms in a five terms title this is a b

Field creation with TokenStream and stored value

2010-01-11 Thread Benjamin Heilbrunn
Hey out there, in lucene it's not possible to create a Field based on a TokenStream AND supply a stored value. Is there a reason why a Field constructor in the form of public Field(String name, TokenStream tokenStream, String storedValue) does not exist? I am using trees of TeeSinkTokenFilter

Re: english dictionary for spelling

2009-12-07 Thread Benjamin Heilbrunn
If you are searching for a dictionary this might be a good ressource for you: http://wiki.services.openoffice.org/wiki/Dictionaries 2009/12/7 m.harig : > > hello all > >      i've a doubt in spell checker , am creating spell index from my > original index , but my original index itself has some mi

Re: Norm Value of not existing Field

2009-12-04 Thread Benjamin Heilbrunn
Erick, I'm not sure if I understand you right. What do you mean by "spinning through all the terms on a field". It would be an option to load all unique terms of a field by using TermEnum. Than use TermDocs to get the docs to those terms. The rest of docs doesn't contain a term and so you know, th

Norm Value of not existing Field

2009-12-03 Thread Benjamin Heilbrunn
Hi, I'm using Lucene 2.9.1 patched with http://issues.apache.org/jira/browse/LUCENE-1260 For some special reason I need to find all documents which contain at least 1 term in a certain field. This works by iterating the norms array only as long as the field exists on every document. For documents

Re: IndexDivisor

2009-12-03 Thread Benjamin Heilbrunn
Maybe the command line argument "-verbose:gc output" would help to determine if GC is running. But you are right - a profiler would be the best way. Benjamin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For a

Re: Inside Lucene

2009-11-26 Thread Benjamin Heilbrunn
Hello, if you are searching for information about lucenes file structure you can find something here: http://lucene.apache.org/java/3_0_0/fileformats.html Benjamin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.o

Re: docBase Parameter in Collector.setNextReader

2009-11-13 Thread Benjamin Heilbrunn
Hello, sorry for causing inconvenience. It was my mistake and i wasn't able to reproduce it completely this morning. My testcase was a little to complex and there were two or three bugs / false assumptions which made it look to me like i explained above. Benjamin --

docBase Parameter in Collector.setNextReader

2009-11-12 Thread Benjamin Heilbrunn
Hello everyone, I'm a little bit confused about the docBase parameter of Collector.setNextReader. Imagine the following: - Create new Index - Index 5 docs - Call IndexWriter.commit() - Index 7 docs - Call IndexWriter.commit() - close Writer Now I have a 2-segment index right? I have

Re: Change norm encoding

2009-11-10 Thread Benjamin Heilbrunn
Hi, I applied http://issues.apache.org/jira/secure/attachment/12411342/Lucene-1260.patch That's exactly what I was looking for. The problem is, that from know on I'm on a patched version and I'm not very happy with breaking compatibility to the "original" jars... So is there a chance that this p

Re: Change norm encoding

2009-11-09 Thread Benjamin Heilbrunn
Hi Mike, thanks for your reply. After making my post i found this (without taking a deeper look): http://issues.apache.org/jira/browse/LUCENE-1260 Looks like a solution for that problem. Why wasn't it applied to lucene? Benjamin -

Change norm encoding

2009-11-09 Thread Benjamin Heilbrunn
Hi, i've got a problem concerning encoding of norms. I want to use int values (0-255) instead of float interpreted bytes. In my own Similarity-Class, which I use for indexing and searching, I implemented the static methods encodeNorms, decodeNorms and getNormDecoder. But because they are static a

Re: how to extract text from the result document in lucene search

2009-10-28 Thread Benjamin Heilbrunn
Hello Dhivya, i'm not familiar with the Lucene Demos. But for Highlighting take a look at http://lucene.apache.org/java/2_9_0/api/contrib-highlighter/index.html Best regards Benjamin