Proposal for changing Lucene's backwards-compatibility policy

2009-10-15 Thread Michael Busch
Hello Lucene users: In the past we have discussed our backwards-compatibility policy frequently on the Lucene developer mailinglist and we are thinking about making some significant changes. In this mail I'd like to outline the proposed changes to get some feedback from the user community. Our c

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
Ah! I thought that the ConstantScoreQuery would also be rewritten into a BooleanQuery, resulting in the same exception. If that's the case, then this should work. I'll give that a try when I get into the office this morning. On Fri, Oct 16, 2009 at 6:46 AM, Michael McCandless < luc...@mikemcca

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Michael McCandless
Well, you could wrap the C | D filter as a Query (using ConstantScoreQuery), and then add that as a SHOULD clause on your toplevel BooleanQuery? Mike On Thu, Oct 15, 2009 at 5:42 PM, Shaun Senecal wrote: > At first I thought so, yes, but then I realised that the query I wanted to > execute was A

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
At first I thought so, yes, but then I realised that the query I wanted to execute was A | B | C | D and in reality I was executing (A | B) & (C | D). I guess my unit tests were missing some cases and don't currently catch this. On Thu, Oct 15, 2009 at 11:59 PM, Michael McCandless < luc...@mikem

OpenRelevance

2009-10-15 Thread Omar Alonso
Hi folks, I would like to know if people are interested in the OpenRelevance project (http://wiki.apache.org/lucene-java/OpenRelevance). I've done quite a few experiments on Amazon Mechanical Turk using TREC and INEX data sets, so one approach would be to use crowdsourcing for such task. Rega

Re: Invitation: Free Webinar - Apache Lucene 2.9: Technical Overview of New Features (Sep 24 02:00 PM EDT)

2009-10-15 Thread Simon Willnauer
http://www.lucidimagination.com/How-We-Can-Help/webinar-Lucene-29 here can you download the slides and watch the webinar. simon On Thu, Oct 15, 2009 at 6:32 PM, Eran Sevi wrote: > Is there a recording of the Webinars for anyone who's missed it? > > On Sat, Sep 19, 2009 at 12:03 AM, wrote: > >>

Re: NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan
The query is: +payloadNear([spanNear([contents:insurance, contents:agent], 1, false), spanNear([contents:winston, contents:salem], 1, false)], 10, false) It's using the default payload function scorer (average value) It doesn't happen on all queries of this type, only a handful. This is pr

Re: NPE in NearSpansUnordered

2009-10-15 Thread Yonik Seeley
Are you using any custom query types? Anything to help us reproduce (like the acutal query this happened on) would be greatly appreciated. -Yonik http://www.lucidimagination.com On Thu, Oct 15, 2009 at 1:17 PM, Peter Keegan wrote: > I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnor

NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan
I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnordered: java.lang.NullPointerException at org.apache.lucene.search.spans.NearSpansUnordered.start(NearSpansUnordered.java:219) at org.apache.lucene.search.payloads.PayloadNearQuery$PayloadNearSpanScorer.processPayloads(PayloadNearQuery.j

Re: How to sort and get document scores afterwards

2009-10-15 Thread Michael McCandless
Yeah this was a change in 2.9... but you can get the scores back, if you do this: TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields, true /* trackDocScores */,

RE: How to sort and get document scores afterwards

2009-10-15 Thread Uwe Schindler
The default API searcher.search works like this now. If you want to control, the retrieval of scores, create a TopFieldCollector directly: http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/TopFiel dCollector.html The static create methods has many possibilities to control the be

How to sort and get document scores afterwards

2009-10-15 Thread Christian Reuschling
Hi, our application enables sorting the result lists according to field values, currently all represented as Strings (we plan to also migrate to the new numeric type capabilities of Lucene 2.9 at a later time) For this, the documents will be sorted e.g. according to the author, which works fine w

Re: Invitation: Free Webinar - Apache Lucene 2.9: Technical Overview of New Features (Sep 24 02:00 PM EDT)

2009-10-15 Thread Eran Sevi
Is there a recording of the Webinars for anyone who's missed it? On Sat, Sep 19, 2009 at 12:03 AM, wrote: > *Description* > > > > __ > > Free Webinar: Apache Lucene 2.9: Discover the Powerful New Features > --- > > J

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Michael McCandless
You should be able to do exactly what you were doing on 2.4, right? (By setting the rewrite method). Mike On Thu, Oct 15, 2009 at 8:30 AM, Shaun Senecal wrote: > Thanks for the explanation Mike.  It looks like I have no choice but to move > any queries which throw TooManyClauses to be Filters. S

Re: How to set boost for a certain term in a query

2009-10-15 Thread Ian Lea
http://lucene.apache.org/java/2_9_0/queryparsersyntax.html#Boosting%20a%20Term -- Ian. On Thu, Oct 15, 2009 at 3:33 PM, Chuan wrote: > > For example, I want the term 'sport' to have more impact on the final rank. > Thanks in advance. > > Chuan --

How to set boost for a certain term in a query

2009-10-15 Thread Chuan
For example, I want the term 'sport' to have more impact on the final rank. Thanks in advance. Chuan -- View this message in context: http://www.nabble.com/How-to-set-boost-for-a-certain-term-in-a-query-tp25909737p25909737.html Sent from the Lucene - Java Users mailing list archive at Nabble.

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-15 Thread Thomas D'Silva
Grant, I have an index with documents that have a text field containing document text, and a tag field containing tags associated with the document. I am trying to calculate the probability that a document contains a particular word and is tagged with a particular tag. This is related to a MoreLik

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-15 Thread Karl Wettin
14 okt 2009 kl. 15.15 skrev Grant Ingersoll: On Oct 12, 2009, at 10:46 PM, Thomas D'Silva wrote: I am trying to compute the counts of terms of the documents returned by running a query using a TermVectorMapper. I was wondering if anyone knew if there was a faster way to do this rather than

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
Thanks for the explanation Mike. It looks like I have no choice but to move any queries which throw TooManyClauses to be Filters. Sadly, this means a max query time of 6s under load unless I can find a way to rewrite the query to span a Query and a Filter. Thanks again On Thu, Oct 15, 2009 at

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Michael McCandless
On Thu, Oct 15, 2009 at 4:57 AM, Shaun Senecal wrote: > Up to Lucene 2.4, this has been working out for us. However, in > Lucene 2.9 this breaks since rewrite() now returns a > ConstantScoreQuery. You can get back to the 2.4 behavior by calling prefixQuery.setRewriteMethod(prefixQuery.SCORING_B

Re: querying multi-value fields

2009-10-15 Thread Renaud Delbru
Hi, there is also the SIREn plugin [1] that allows to index multi-valued fields, with values of variable length, and to query them individually. [1] http://siren.sindice.com -- Renaud Delbru On 12/10/09 21:31, Angel, Eric wrote: I need to analyze these values since I also want the benefits p

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
Sorry for the double post, but I think I can clarify the problem a little more. We want to execute: query: A | B | C | D filter: null However, C and D cause TooManyClauses, so instead we execute: query: A | B filter: C | D My understanding is that Lucene will apply the Filter (C

PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
I know this has been discussed to great length, but I still have not found a satisfactory solution and I am hoping someone on the list has some ideas... We have a large index (4M+ Documents) with a handful of Fields. We need to perform PrefixQueries on multiple fields. The problem is that when t