Re: Maven 4.1-SNAPSHOTS not up-to-date

2012-12-11 Thread Steve Rowe
I just committed a change to the ant task that runs on Jenkins once a day (once per day on both trunk and branch_4x), so that snapshots get deployed before tests are run under Maven. On Dec 11, 2012, at 4:11 PM, Neil Ireson wrote: > Despite the poo-pooing of Maven I would be keen for the nigh

RE: Maven 4.1-SNAPSHOTS not up-to-date

2012-12-11 Thread Neil Ireson
Despite the poo-pooing of Maven I would be keen for the nightly builds to be release to the apache repository. This will have the advantage that those lazy Maven users who cannot be bothered to do their own repository maintenance (shame on me) will quickly be able to test the latest code and rep

RE: Maven 4.1-SNAPSHOTS not up-to-date

2012-12-11 Thread Uwe Schindler
I think you can do this by reordering the depends clauses in the nightly maven task on top-level build.xml. No need to change Jenkins job at all. Maven is executed from inside ANT (via extra-targets.xml) since a few months. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www

Re: Maven 4.1-SNAPSHOTS not up-to-date

2012-12-11 Thread Steve Rowe
I'm thinking of moving the arifact upload to happen prior to running tests, so that people can have access to snapshots through the standard maven channel regardless of the ability of solr tests to succeed under maven on freebsd. On Dec 11, 2012, at 10:07 AM, Uwe Schindler wrote: > Hi, > > Th

Re: Long query optimisation: using some terms for scoring only

2012-12-11 Thread Matthew Willson
Hi lukai That sounds like a nice optimisation, perhaps more sophisticated than the "AND_MAYBE" support I was looking for but a similar idea. Is the code available anywhere? Cheers -Matt On 11/12/12 17:45, lukai wrote: I had implemented WAND in solr for our own project. It can improve the p

Re: Long query optimisation: using some terms for scoring only

2012-12-11 Thread lukai
I had implemented WAND in solr for our own project. It can improve the performance a lot. For your reference: http://dl.acm.org/citation.cfm?id=956944 But it needs to change index a little bit. Thanks, On Tue, Dec 11, 2012 at 6:19 AM, Matthew Willson wrote: > Hi all > > I'm currently benchmark

Re: Opposite of SpanFirstQuery - Searching for documents by last term in a field

2012-12-11 Thread Ian Lea
The javadoc for SpanFirstQuery says it is a special case of SpanPositionRangeQuery so maybe you can use the latter directly, although you might need to know the position of the last term which might be a problem. Alternatives might include reversing the terms and using SpanFirst or adding a specia

Opposite of SpanFirstQuery - Searching for documents by last term in a field

2012-12-11 Thread Hasenberger, Josef
Hi, I wonder if there is a way to use a SpanQuery to find documents with fields that end with a certain term. Kind of the oppoisite of SpanFirstQuery, i.e. "SpanLastQuery", if you want. What I would like to do: Find terms that are at the end of a field. Example: Assume the following field conte

RE: Maven 4.1-SNAPSHOTS not up-to-date

2012-12-11 Thread Uwe Schindler
Hi, The problem is: the maven build does not succeed at the moment because Solr tests are likely to fail, see: https://builds.apache.org/computer/lucene/ But you can download the standard Lucene and Solr Artifacts from https://builds.apache.org/job/Lucene-Artifacts-4.x/ and https://builds.apac

Long query optimisation: using some terms for scoring only

2012-12-11 Thread Matthew Willson
Hi all I'm currently benchmarking Lucene to get an understanding of what optimisations are available for long queries, and wanted to check what the recommended approach is. Unsurprisingly a naive approach to long queries (just keep adding SHOULD clauses to a big BooleanQuery) scales close to

Re: Separating the document dataset and the index dataset

2012-12-11 Thread Ramprakash Ramamoorthy
On Tue, Dec 11, 2012 at 4:10 PM, Uwe Schindler wrote: > In Lucene 4.1 the compressing codec is no longer a separate codec, the > main Codec ("Lucene41") compresses by default. Just reindex your data or > use IndexUpgrader. > Thanks Uwe. This one helped. My index size came down from 816 Mb to 198

Maven 4.1-SNAPSHOTS not up-to-date

2012-12-11 Thread Neil Ireson
Hi all, I was wanting to use the 4.1 version to access some of the latest improvements, I was hoping to just connect to the maven snapshot repository but it seems that they are not being updated as they are from October 24/25. Is this a deliberate policy or a "bug", and any chance of a fi

RE: Separating the document dataset and the index dataset

2012-12-11 Thread Uwe Schindler
In Lucene 4.1 the compressing codec is no longer a separate codec, the main Codec ("Lucene41") compresses by default. Just reindex your data or use IndexUpgrader. Uwe - UWE SCHINDLER Webserver/Middleware Development PANGAEA - Data Publisher for Earth & Environmental Science MARUM (Cognium b

RE: Stemming and Wildcard - or fire and water

2012-12-11 Thread Lars-Erik Aabech
A possible workaround could be to modify search terms with wildcard tokens by stemming them manually and creating a new search string. Searches for hersen* would be modified to hers* and return what you expect. Con is of course that you search for more than you specified. Lars-Erik > -Origin

Re: Separating the document dataset and the index dataset

2012-12-11 Thread Ramprakash Ramamoorthy
On Tue, Dec 11, 2012 at 3:14 PM, Uwe Schindler wrote: > You can use Lucene 4.1 nightly builds from http://goo.gl/jZ6YD - it is > not yet released, but upgrading from Lucene 4.0 is easy. If you are not yet > on Lucene 4.0, there is more work to do, in that case a solution to your > problem would b

RE: Stemming and Wildcard - or fire and water

2012-12-11 Thread Uwe Schindler
This is a well-known problem: Wildcards cannot be analyzed by the query parser, because the analysis would destroy the wildcard characters; also stemming of parts of terms will never work. For Solr there is a workaround (MultiTermAware component), but it is also very limited and only works when

Stemming and Wildcard - or fire and water

2012-12-11 Thread Bayer Dennis
Hello there, my colleague and I ran into an example which didn't return the result size which we were expecting. We discovered that there is a mismatch in handling terms while indexing and searching. This issue is already discussed several times in the internet as we found out later on, but in o

RE: Separating the document dataset and the index dataset

2012-12-11 Thread Jain Rahul
Hi Ram, You need to have lucene-codec.jar in classpath having the CompressingCodec.java and other related classes. If you are having your stuff on top of lucene then you can set it by calling setCodec(Codec codec) in IndexWriterConfig. But If you are using solr, then since I couldn't figure ou

RE: Separating the document dataset and the index dataset

2012-12-11 Thread Uwe Schindler
You can use Lucene 4.1 nightly builds from http://goo.gl/jZ6YD - it is not yet released, but upgrading from Lucene 4.0 is easy. If you are not yet on Lucene 4.0, there is more work to do, in that case a solution to your problem would be to save the stored fields in a separate database/whatever a

Re: Separating the document dataset and the index dataset

2012-12-11 Thread Ramprakash Ramamoorthy
On Fri, Dec 7, 2012 at 1:11 PM, Jain Rahul wrote: > If you are using lucene 4.0 and afford to compress your document dataset > while indexing, it will be a huge savings in terms of disk space and also > in IO (resulting in indexing throughput). > > In our case, it has helped us a lot as compresse