Re: Highlighting text for queries with huge numbers of terms

2006-02-16 Thread Daniel Noll
Chris Hostetter wrote: if you build a map whose keys are tokens which begin token lists for queries, each of which is is mapped to a value which is a list of lists of tokens, then you can make one pass over the tokens from the main text, and "lookup" wether or not this is the potential start of s

Re: Highlighting text for queries with huge numbers of terms

2006-02-16 Thread Chris Hostetter
: The existing highlighting code we wrote basically works like this... :1. Get the text out of the Swing component. :2. Break the text into tokens using the appropriate Analyzer. :3. For each term: :3.1. Break the term into tokens using the same Analyzer. :3.2. Iterate

Highlighting text for queries with huge numbers of terms

2006-02-16 Thread Daniel Noll
Hi all. I've just implemented some magic query syntax which expands simple queries to queries containing a whole lists of words. I've implemented the queries themselves using a slight modification on the theme of QueryFilter (MultiQueryFilter, runs all queries to mark a single bitset, much f

Similarity Usage: tf(int) vs tf(float)

2006-02-16 Thread Chris Hostetter
I've been working on my own custom similarity lately, to take advantage of some content domain knowledge. One of the things that never really made sense to me before about the Similarity class was the existence of the two tf methods... public abstract float tf(float freq); public float tf(

Re: Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Mufaddal Khumri
Yes. thats exactly the problem. I just found out that analyzer was not being set correctly. Thanks, Chris Hostetter wrote: : Standard analyzer lower cases while indexing and searching. Correct, but since the toString() of your query still has capital words in it (like "contentNew:Wireless")

Re: Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Chris Hostetter
: Standard analyzer lower cases while indexing and searching. Correct, but since the toString() of your query still has capital words in it (like "contentNew:Wireless") you obviously didn't build this query using the StandardAnalyzer -- IndexSearcher doesn't apply any Analyzers for you when you s

Re: Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Mufaddal Khumri
I am using the standard analyzer with luke. Standard analyzer lower cases while indexing and searching. The BooleanQuery, finalQuery.toString() in my case below is: +(+contentNew:wireless +contentNew:fm +contentNew:car +contentNew:transmitter) +entity:product +(name:wireless fm car transmitte

Re: Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Erik Hatcher
How are you constructing your BooleanQuery and what Analyzer are you using with Luke? You have some capitalized words in your query, and most analyzers would lowercase those, which may be the issue (perhaps you indexed the capitalized words?). Erik On Feb 16, 2006, at 2:41 PM, Mu

Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Mufaddal Khumri
Hi, I have a query that gets hits via luke. I can see the documents it finds. But when I run the same query via my java code it returns 0 hits. Note: 1. I am using standard analyzer while indexing and searching. 2. I have made sure that I am querying the same index via luke or through my java

Re: updating index

2006-02-16 Thread Grant Ingersoll
Revati, This sounds like a Hibernate problem, I suggest you refer to their documentation and forum. -Grant revati joshi wrote: Hi, i hve tried updating lucene index using Hibernate lifecycle class but not able to get the implementation of this class. www.hibernate.org - Using Lifec

Re: Vector Space Model <-> Probabilistic Model

2006-02-16 Thread Grant Ingersoll
You may find some useful reading at: http://wiki.apache.org/jakarta-lucene/InformationRetrieval Karl Koch wrote: I am looking for a comparison between the theoretical Vector Space Model and the theoretical Probabilistic Model in Information Retrieval. I know that comcrete implementations do diff

updating index

2006-02-16 Thread revati joshi
Hi, i hve tried updating lucene index using Hibernate lifecycle class but not able to get the implementation of this class. www.hibernate.org - Using Lifecycles and Interceptors to update Lucene searches.htm The onSave(),onUpdate() method has got the Session parameter which is to pass

Re: Lucene Query ... understanding

2006-02-16 Thread Chris Hostetter
: Am just trying to see if i understand the lucene query below correctly. : : +(+contentNew:radio +contentNew:mp3) +entity:product +(name:radio : mp3^4.0 (contentNew:radio contentNew:mp3) contentNew:radio mp3^2.0) : : Let me see if can understand the above query correctly: your interpretation isn

Vector Space Model <-> Probabilistic Model

2006-02-16 Thread Karl Koch
I am looking for a comparison between the theoretical Vector Space Model and the theoretical Probabilistic Model in Information Retrieval. I know that comcrete implementations do differ from that. However, I am looking for papers that compare the performance of both in particular applications. Doe

Re: BM25 Similarity implementation

2006-02-16 Thread Doug Cutting
Trieschnigg, R.B. (Dolf) wrote: I would like to implement the Okapi BM25 weighting function using my own Similarity implementation. Unfortunately BM25 requires the document length in the score calculation, which is not provided by the Scorer. How do you want to measure document length? If th

Lucene Query ... understanding

2006-02-16 Thread Mufaddal Khumri
Hi, Am just trying to see if i understand the lucene query below correctly. +(+contentNew:radio +contentNew:mp3) +entity:product +(name:radio mp3^4.0 (contentNew:radio contentNew:mp3) contentNew:radio mp3^2.0) Let me see if can understand the above query correctly: 1. the contentNew field ha

Search environment: the best choice

2006-02-16 Thread David Trattnig
Hi, I've following constellation (planned architecture): [Webserver - APACHE] which serves the content [unspecified other servers] [CMS Server / SearchEngine - TOMCAT] handles the content creation and publishing to the webserver indexing of content stored at the apache-machine The tomcat-mach

BM25 Similarity implementation

2006-02-16 Thread Trieschnigg, R.B. \(Dolf\)
Hi, I would like to implement the Okapi BM25 weighting function using my own Similarity implementation. Unfortunately BM25 requires the document length in the score calculation, which is not provided by the Scorer. Does anyone know a solution to this problem? I've tried to find other Similarit

NullPointerException while closing the index writer

2006-02-16 Thread Shivani Sawhney
Hi, I couldn't fix the problem while creating an index, so I decided to clean all the indexes from the server and try to re-index all my documents. I am getting a NullPointerException, again while closing the index writer. I don't know what I can be doing wrong while simply creating a fresh

RE: Iterating hits

2006-02-16 Thread Vanlerberghe, Luc
My guess is you are using the same reader both for searching and deleting. The Hits class buffers the first 100 hits, and when you go beyond that, it reruns the query to get more hits. If you use the same reader, the searcher probably doesn't return the same results the second time. If different

RE: It worked---NullPointerException while closing the index writer

2006-02-16 Thread Shivani Sawhney
Hi, At least the re-indexing part worked I just removed the finally block and the NullPointerException was solved.: if (indexwriter != null) { System.out.println("going to close index writer"); indexwriter.close(); }