Telling query-time QueryParser how to work by a TokenFilter

2011-06-08 Thread Em
Hi list, let's take a simple example. A TokenFilter creates the terms "i" and "pod" from the word "ipod". This example is simple and if all usecases for the self-made tokenFilter were like this, I could do the whole thing on index-side. However, it is not - WordDelimiterFilter is no option. The p

Re: Lemmatization

2011-06-08 Thread Robert Muir
On Wed, Jun 8, 2011 at 7:52 AM, Mohamed Yahya wrote: > You're right. Still, I am not sure if there is a library that would > take care of examples such as the one I gave. > which is why you might want to just pick one that is close to what you want, and then customize/tune it with any stuff parti

Similarity class and searchPayloads

2011-06-08 Thread Alex vB
Hello everybody, I am just curious about following case. Currently, I create a boolean AND query which loads payloads. In some cases it occurs that Lucene loads payloads but does not return hits. Therefore, I assume that payloads are directly loaded whith each doc ID from the posting list before

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Trejkaz
On Wed, Jun 8, 2011 at 6:52 PM, Elmer wrote: > the parsed query becomes: > > '+(title:the) +(title:project desc:project)'. > > So, the problem is that docs that have the term 'the' only appearing in > their desc field are excluded from the results. Subclass MFQP and override getFieldQuery. If th

Re: Lemmatization

2011-06-08 Thread Karl Wettin
Perhaps "least frequent substring" or even "suffix truncation" might be enough for your needs. Here is a related paper: http://web.jhu.edu/bin/q/b/p75-mcnamee.pdf karl On Jun 8, 2011, at 1:52 PM, Mohamed Yahya wrote: > You're right. Still, I am not sure if there is a library that wo

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Ian Lea
I'm sure you are right and I'm wrong - sorry for the waste of space. However I still think you should build it all up in code. -- Ian. On Wed, Jun 8, 2011 at 4:33 PM, Elmer wrote: >> Using MFQP with AND >> everywhere you'll never get a match if some fields don't contain all >> of the search te

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Elmer
> Using MFQP with AND > everywhere you'll never get a match if some fields don't contain all > of the search terms" I'm sorry to say, but that's not true I guess, look how the query parser parses the following query: 'information retrieval' --parsed-to--> +(title:inform description:inform authors.

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Ian Lea
Then surely the stop word issue is a red herring. Using MFQP with AND everywhere you'll never get a match if some fields don't contain all of the search terms. Even if Erick's exact answer won't apply, I suspect that building up a composite boolean query is the way to go. -- Ian. On Wed, Jun 8

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Elmer
Sorry, I made a mistake here: > Unfortunately, the solution that Erick gave won't do the trick > > > bq.add(qp.parse("title:(the AND project)", SHOULD)) > > > bq.add(qp.parse("desc:(the AND project)", SHOULD)) > This still won't match documents where both 'the' and 'project' appear > in DIFFERENT

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Elmer
Thank you, I already use the PerFieldAnalyzerWrapper (by Hibernate Search) ;) And that's where the problem comes in: different fields using different analyzers (some with, some without a stopfilter). For each term (tokenized by MFQP itself?), it applies the given analyzer on each field. If the ana

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Erick Erickson
You're right, that's a better place to start Erick On Wed, Jun 8, 2011 at 9:42 AM, Ian Lea wrote: > Except that I think he has loads of other fields and wants to keep it simple. > > But how about passing a PerFieldAnalyzerWrapper instance as the > analyzer to MFQP?  Worth a try. > > > -- > I

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Ian Lea
Except that I think he has loads of other fields and wants to keep it simple. But how about passing a PerFieldAnalyzerWrapper instance as the analyzer to MFQP? Worth a try. -- Ian. On Wed, Jun 8, 2011 at 2:38 PM, Erick Erickson wrote: > Could you just construct a BooleanQuery with the > term

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Erick Erickson
Could you just construct a BooleanQuery with the terms against different fields instead of using MFQP? e.g. bq.add(qp.parse("title:(the AND project)", SHOULD)) bq.add(qp.parse("desc:(the AND project)", SHOULD)) etc...? If your QueryParser was created with a PerFieldAnalyzerWrapper I think you mig

Re: MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Ian Lea
I guess the base problem is that MFQP only accepts one analyzer. Presumably you are using different analyzers for your title and desc fields, and it might do what you wanted if you could pass in a list of analyzers along with a list of fields. Sounds like something that might not be too hard to co

Re: Lucene Result

2011-06-08 Thread Erick Erickson
glad you found it. I'd still recommend you get a copy of Luke, though, it's invaluable. Best Erick On Wed, Jun 8, 2011 at 8:49 AM, Pranav goyal wrote: > Hi Erick, > > Thanks for the answer, before using Luke I got where I am making a mistake, > and I replied it here. > > But thanks for the r

Re: Lucene Result

2011-06-08 Thread Pranav goyal
Hi Erick, Thanks for the answer, before using Luke I got where I am making a mistake, and I replied it here. But thanks for the reply. On Wed, Jun 8, 2011 at 6:14 PM, Erick Erickson wrote: > hard to say. You should get a copy of Luke and inspect your index to > see if what you > think you put t

Re: Lucene Result

2011-06-08 Thread Erick Erickson
hard to say. You should get a copy of Luke and inspect your index to see if what you think you put there is actually there. When you added data to your index, did you perform a commit? Best Erick On Wed, Jun 8, 2011 at 2:45 AM, Pranav goyal wrote: > There is one field DocId which I am storing as

Re: Lemmatization

2011-06-08 Thread Mohamed Yahya
You're right. Still, I am not sure if there is a library that would take care of examples such as the one I gave. On Wed, Jun 8, 2011 at 11:25, Lahiru Samarakoon wrote: > Hi, > >> >> Is there something in Lucene that supports lemmatization of the following >> form: >> >> Mexican --> Mexico (from

Re: Lucene indexing & Searching

2011-06-08 Thread Pranav goyal
Oh sry, I got my error and it worked. Thanks On Wed, Jun 8, 2011 at 3:57 PM, Pranav goyal wrote: > import java.io.File; > import java.io.IOException; > import java.util.Collection; > import java.util.Iterator; > import java.util.List; > import java.util.Map; > > import org.apache.lucene.analysi

Lucene indexing & Searching

2011-06-08 Thread Pranav goyal
import java.io.File; import java.io.IOException; import java.util.Collection; import java.util.Iterator; import java.util.List; import java.util.Map; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document;

Re: Lemmatization

2011-06-08 Thread Lahiru Samarakoon
Hi, > > Is there something in Lucene that supports lemmatization of the following > form: > > Mexican --> Mexico (from adjective to name/noune) > > Lemmatization do not change part of speech. I think you are looking for a stemming algorithm. http://nlp.stanford.edu/IR-book/html/htmledition/stemmi

Lemmatization

2011-06-08 Thread Mohamed Yahya
Hi, Is there something in Lucene that supports lemmatization of the following form: Mexican --> Mexico (from adjective to name/noune) Thanks Mohamed  Yahya - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For a

MultiFieldQueryParser with default AND and stopfilter

2011-06-08 Thread Elmer
Hi, I have a use case in which I use the MultiFieldQueryParser (MFQP) on some fields that use and some fields that don't use a stopfilter. The default operator of the MFQP is set to AND. For example, if the search query is 'the project' (with 'the' included in the stoplist) and the search fields a