Re: Error tolerant text search with Lucene?

2008-04-04 Thread Marjan Celikik
Mathieu Lecarme wrote: wever I don't fully understand what do you mean by "iterate over your query". I would like a conceptual answer how is this done with Lucene, not a technical one.. Your query is a tree, with BooleanQuery as branch and other query as leaf. If you wont to transforma query

Re: Error tolerant text search with Lucene?

2008-04-04 Thread Marjan Celikik
Mathieu Lecarme wrote: You have to iterate over your query, if it's a BooleanQuery, keep it, if it's a TermQuery, replace it with a BooleanQuery with all variants of the Term with Occur.SHOULD M. Thanks.. however I don't fully understand what do you mean by "iterate over your query". I wou

Re: Error tolerant text search with Lucene?

2008-04-03 Thread Marjan Celikik
Dominique Béjean wrote: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html -Message d'origine- De : Marjan Celikik [mailto:[EMAIL PROTECTED] Envoyé : jeudi 3 avril 2008 15:12 À : java-user@lucene.apache.org Objet : Error tolerant text search with Lucene? Hi everyon

Error tolerant text search with Lucene?

2008-04-03 Thread Marjan Celikik
Hi everyone, I know that there are packages that support the "Did you mean ... ?" search features with lucene which tries to find the most suited correct-word query.. however, so far I haven't encountered the opposite search feature: given a correct query, find all documents which contain misspel

Error tolerant text search with Lucene?

2008-04-03 Thread Marjan Celikik
Hi everyone, I know that there are packages that support the "Did you mean ... ?" search features with lucene which tries to find the most suited correct-word query.. however, so far I haven't encountered the opposite search feature: given a correct query, find all documents which contain mis

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Mark Miller wrote: That is why the original contrib does not work with PhraseQuery's. It simply matches Tokens from the query with those in the TokenStream. LUCENE-794 takes the TokenStream and shoves it into a MemoryIndex. Then, after converting the query to a SpanQuery approximation, getSp

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Marjan Celikik wrote: Mark Miller wrote: The Highlighter works by comparing the TokenStream of the document with the Tokens in the query. The TokenStream can be rebuilt from the index if you use TermVectors with TokenSources or you can get it by reanalyzing the document. Each Token from the

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Mark Miller wrote: The Highlighter works by comparing the TokenStream of the document with the Tokens in the query. The TokenStream can be rebuilt from the index if you use TermVectors with TokenSources or you can get it by reanalyzing the document. Each Token from the TokenStream is checked

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Mark Miller wrote: Oh yeah...something that you may not have seen is that this has a dependency on MemoryIndex from contrib. You need that jar as well. - Mark Hm, I need the source code. How do I download the files from https://issues.apache.org/jira/browse/LUCENE-794 (all I see are some .pat

Re: Highlighting + phrase queries

2008-01-10 Thread Marjan Celikik
Mark Miller wrote: The contrib Highlighter doesn't know and highlights them all. Check out my patch here for position sensitive highlighting: https://issues.apache.org/jira/browse/LUCENE-794 It seems that the patch does not work with Lucene 2.2 as I get some compile errors. Is this really the

Re: Highlighting + phrase queries

2008-01-09 Thread Marjan Celikik
Mark Miller wrote: The contrib Highlighter doesn't know and highlights them all. Check out my patch here for position sensitive highlighting: https://issues.apache.org/jira/browse/LUCENE-794 OK, before trying it out, I would like to know does the patch work for mixed queries, e.g. "a b" +c -d "

Highlighting + phrase queries

2008-01-09 Thread Marjan Celikik
Dear all, Let's assume I have a phrase query and a document which contain the phrase but also it contains separate occurrences of each query term. How does the highlighter know that should only display fragments which contain phrases and not fragments which contain only the query words (not as

Re: Query processing with Lucene

2008-01-08 Thread Marjan Celikik
Doron Cohen wrote: Hi Marjan, Lucene process the query in what can be called one-doc-at-a-time. For the example query - x y - (not the phrase query "x y") - all documents containing either x or y are considered a match. When processing the query - x y - the posting lists of these two index ter

Query processing with Lucene

2008-01-06 Thread Marjan Celikik
Dear all, Maybe this topic is already discussed (then can I get a reference please?)... I would like to know how does Lucene actually process the query. For example, take a 2-word query "x y". Does Lucene fetch the lists of "x" and "y" and intersect them, or do they do something more fancy, f

Stemming and highlighting

2008-01-04 Thread Marjan Celikik
Dear all, I am a new Lucene user and I would like to know the following. How does Lucene bring together fuzzy queries and highlighting? Let's say for the query algorithm, the word algorith is also a match, how do the highlighter know that it should also highlight occurrences of the word algo