Scoring in Lucene

2011-10-05 Thread David Ryan
Hi, The defaulting scoring in Lucene uses tf x idf^2 instead of tf x idf . Does any have have insight that why not using tf x idf? Here is the note on score calculation. https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/Similarity.html

Re: query for non-existence of fields

2011-10-05 Thread Ian Lea
MatchAllDocsQuery. It's in the FAQ. http://wiki.apache.org/lucene-java/LuceneFAQ#How_does_one_determine_which_documents_do_not_have_a_certain_term.3F -- Ian. On Wed, Oct 5, 2011 at 8:11 PM, Sam Jiang wrote: > Hi > > I'm struggling to figure out a way to query for the non-existence of some > fi

query for non-existence of fields

2011-10-05 Thread Sam Jiang
Hi I'm struggling to figure out a way to query for the non-existence of some fields. e.g. matches all documents that doesn't contain field X I tried: BooleanQuery q = new BooleanQuery(); q.add(new BooleanClause( new TermQuery(new Term("fieldName", "*")),

Help Wanted: Lucene CJK Consultant

2011-10-05 Thread Jonathan Kamens
We are seeking a consultant to engage immediately to help us implement a proof-of-concept integration of CJK indexing+seach into our Lucene search infrastructure running within JBoss. The integration would involve modifying our server-side Java infrastructure to support indexing CJK documents a

Re: How is Number of Boolean Clauses calculated - Minimum Should Match?

2011-10-05 Thread Chris Hostetter
: > Presumably this query would fail, since you've only got three clauses. : > Easy to verify. : : Seems like different behaviour compared to Solr. Probably Solr is : intelligent enough to reduce the parameter to the maximum value if it is : too large. correct, the dismax parser in solr is smar

Re: TaxWriter leakage?

2011-10-05 Thread Mihai Caraman
2011/10/4 Doron Cohen > LUCENE-3484 is resolved. > Mihai, could you give it a try and see if this solves the NPE problem in > your setup? > As Jim Carrey whould say: Like a glove!

Re: How is Number of Boolean Clauses calculated - Minimum Should Match?

2011-10-05 Thread Em
Hi Ian, thanks for the fast feedback. >> If the MM was set to 4 (too many), than this means all queries have to >> match? > > Presumably this query would fail, since you've only got three clauses. > Easy to verify. Seems like different behaviour compared to Solr. Probably Solr is intelligent eno

Re: How is Number of Boolean Clauses calculated - Minimum Should Match?

2011-10-05 Thread Ian Lea
Sorry - you did say StopFilter or SynonymFilter but I started talking about oal.search.Filter instead. > So if an Analyzer contains a StopFilter and the parser uses this > Analyzer, than the following will happen: > > Original: > "To be or not to be said Shakespeare" > > Stopwords: To, be, or > >

Re: How is Number of Boolean Clauses calculated - Minimum Should Match?

2011-10-05 Thread Em
Hi, thank you Uwe and Ian! So if an Analyzer contains a StopFilter and the parser uses this Analyzer, than the following will happen: Original: "To be or not to be said Shakespeare" Stopwords: To, be, or Resulting BooleanClauses: - not - said - Shakespeare Is this right? If the MM was set to

RE: How is Number of Boolean Clauses calculated - Minimum Should Match?

2011-10-05 Thread Uwe Schindler
Hi, The TooManyClausesException is thrown by BooleanQuery.add(Clause). Because of this, it can only count clauses actually added to the BooleanQuery - terms thrown away by QueryParser before are not counted as they will not be in the final query. If a token in the query parser expands to multiple

Re: How is Number of Boolean Clauses calculated - Minimum Should Match?

2011-10-05 Thread Ian Lea
It will work on the query, whether produced by a query parser or constructed in code. I don't see that the number of clauses will change if you are applying filters. Filters are not query clauses, although it can get confusing if you start using stuff like FilteredQuery or QueryWrapperFilter. -

How is Number of Boolean Clauses calculated - Minimum Should Match?

2011-10-05 Thread Em
Hello list, in what way does BooleanQuery calculates the number of its clauses? Is this number based on the analyzed query or based on the raw query-string? Imagine you got a StopFilter or a SynonymFilter applied to a BooleanQuery during analyzing - the number of clauses could shrink or increase.