Highlighter and Phrase Queries

2008-11-10 Thread Sertic Mirko, Bedag
[EMAIL PROTECTED] I am searching for a solution to make the Highlighter run property in combination with phrase queries. I want to highlight text with a phrase query like "windows printserver", the following highlighted: "windows printservers" are good blah blah "windows" manages "print

Re: Highlighter and Phrase Queries

2008-11-10 Thread Mark Miller
Check out the SpanScorer. - Mark On Nov 10, 2008, at 8:25 AM, "Sertic Mirko, Bedag" <[EMAIL PROTECTED] > wrote: [EMAIL PROTECTED] I am searching for a solution to make the Highlighter run property in combination with phrase queries. I want to highlight text with a phrase query like "w

Re: Boosting results

2008-11-10 Thread Mark Miller
Michael McCandless wrote: But: it's slow to load a field for the first time. LUCENE-1231 (column-stride fields) aims to greatly speed up the load time. Test it out though. In some recent testing I was doing it was *way* faster than I thought it would be based on what I had been reading. Of c

AW: Highlighter and Phrase Queries

2008-11-10 Thread Sertic Mirko, Bedag
Hi Thank you for your response. Are there examples available? Regards Mirko -Ursprüngliche Nachricht- Von: Mark Miller [mailto:[EMAIL PROTECTED] Gesendet: Montag, 10. November 2008 14:45 An: java-user@lucene.apache.org Betreff: Re: Highlighter and Phrase Queries Check out the SpanScore

Re: AW: Highlighter and Phrase Queries

2008-11-10 Thread Mark Miller
Check out the unit tests for the highlighter and there are a bunch of examples. Its pretty much the same as using the standard scorer, except that it requires a cached token filter so that the tokenstream can be read more than once. Once you pass in the SpanScorer to the Highlighter though,

AW: AW: Highlighter and Phrase Queries

2008-11-10 Thread Sertic Mirko, Bedag
Ok, i will do. I guess it will also work with BooleanQueries and combined Term/Wildcard/Phrase Queries? -Ursprüngliche Nachricht- Von: Mark Miller [mailto:[EMAIL PROTECTED] Gesendet: Montag, 10. November 2008 15:38 An: java-user@lucene.apache.org Betreff: Re: AW: Highlighter and Phrase

Re: possible score value

2008-11-10 Thread Chris Hostetter
: Did you come across : : scoreNorm = 1.0f / topDocs.getMaxScore(); : or something of this sort in Hits? : As per my knowledge, the initial score is more than 1 but finally the scores : get divided by the maxScore of the matched doc set. i.e. Setting an upper : limit of 1 (for the max scorer

Re: AW: AW: Highlighter and Phrase Queries

2008-11-10 Thread Mark Miller
Right, it will work the same as the standard Highlighter except that it highlights spans and phrase queries based on position. Sertic Mirko, Bedag wrote: Ok, i will do. I guess it will also work with BooleanQueries and combined Term/Wildcard/Phrase Queries? -Ursprüngliche Nachricht-

[Announce] Call For Papers opens for ApacheCon US 2009

2008-11-10 Thread Grant Ingersoll
If you have only 30 seconds to read this; Join us in celebrating the ASF's 10th Anniversary at ApacheCon! The Call for Papers is now open for ApacheCon US 2009, taking place 2-6 November in Oakland, California. Proposals are being accepted at http://us.apacheco

Re: Boosting results

2008-11-10 Thread Michael McCandless
Well .. the FieldCache API is documented here (for 2.4.0): http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/search/FieldCache.html EG you can load ints (for example) like this: FieldCache.DEFAULT.getInts(reader, "myfield"); This returns an array mapping docID --> int va

Re: Boosting results

2008-11-10 Thread Stefan Trcek
On Friday 07 November 2008 18:46:17 Michael McCandless wrote: > > Sorting populates the field cache (internal to Lucene) for that > field,   meaning it loads all values for all docs and holds them in > memory. This makes the first query slow, and, consumes RAM, in > proportion to how large your ind

Re: Order the index by timestamp field and Get n documents

2008-11-10 Thread Cool The Breezer
Could able to do that using range query String end = "25337325126";//i.e. 11/30/, assume that this is max end date Term endTerm = new Term("timestamp",end); RangeQuery rangeQuery = new RangeQuery(null,endTerm,true); Sort sort = new Sort("timestamp",true); Filter dupFilte

Re: Term numbering and range filtering

2008-11-10 Thread Tim Sturge
Yes, that is a significant issue. What I'm coming to realize is that either I will end up with something like class MultiFilter { String field; private int[] termInDoc; Map termToInt; ... } which can be entirely built on the current lucene APIs but has significantly more overhead (the

Re: Term numbering and range filtering

2008-11-10 Thread Paul Elschot
Tim, I didn't follow all the details, so this may be somewhat off, but did you consider using TermVectors? Regards, Paul Elschot Op Monday 10 November 2008 19:18:38 schreef Tim Sturge: > Yes, that is a significant issue. What I'm coming to realize is that > either I will end up with something l

incremental update of index

2008-11-10 Thread ChadDavis
In the FAQ's it says that you have to do a manual incremental update: How do I update a document or a set of documents that are already indexed? > > There is no direct update procedure in Lucene. To update an index > incrementally you must first *delete* the documents that were updated, and > *the

Re: Boosting results

2008-11-10 Thread Stefan Trcek
On Monday 10 November 2008 13:55:31 Michael McCandless wrote: > > Finally, you might want to instead look at Solr, which provides facet > counting out of the box, rather than roll your own... Doooh - new api, but it's facet counting sounds good. Any starting points for moving from plain lucene to

Re: incremental update of index

2008-11-10 Thread Erick Erickson
You have to have indexed something that uniquely identifies the document in order to know what the old one is. Really, this is the same question as updating, isn't it? If you could update a document in place, you'd have to know what document that was. If you know that information, you know which do

performance/scalability issues re filtering of protected search results

2008-11-10 Thread Michael Wechner
Hi We have about 1 mio documents and growing within a hierarchical order (3 to 20 deep) and about 3000 people accessing these nodes, whereas some people have access to certain branches and other people to other branches and some branches are shared. The access control of these nodes is changi

Re: incremental update of index

2008-11-10 Thread Donna L Gresh
ChadDavis <[EMAIL PROTECTED]> wrote on 11/10/2008 02:22:45 PM: > In the FAQ's it says that you have to do a manual incremental update: > > How do I update a document or a set of documents that are already indexed? > > > > There is no direct update procedure in Lucene. To update an index > > incr

Re: performance/scalability issues re filtering of protected search results

2008-11-10 Thread Erick Erickson
This has been discussed more than a few times, I suggest you take a look at the searchable archive for things like privileges, access privileges, etc. You'll find lots of information faster that way... Best Erick On Mon, Nov 10, 2008 at 2:52 PM, Michael Wechner <[EMAIL PROTECTED]>wrote: > Hi > >

autoCommit

2008-11-10 Thread ChadDavis
The FAQ's have this index performance tip: Use autoCommit=false when you open your IndexWriter > > In Lucene 2.3 there are substantial optimizations for Documents that use > stored fields and term vectors, to save merging of these very large index > files. You should see the best gains by using au

Re: incremental update of index

2008-11-10 Thread ChadDavis
That's what I thought. So, that leads me to . . . is it necessarily all that much faster to index in an incremental update fashion, rather than just clobbering the old index? On Mon, Nov 10, 2008 at 12:52 PM, Erick Erickson <[EMAIL PROTECTED]>wrote: > You have to have indexed something that un

Re: autoCommit

2008-11-10 Thread Michael McCandless
Actually, all non-deprecated ctors of IndexWriter set autoCommit to false. Ie, in 3.0 autoCommit false will become the only option. Mike ChadDavis wrote: The FAQ's have this index performance tip: Use autoCommit=false when you open your IndexWriter In Lucene 2.3 there are substantial o

Re: autoCommit

2008-11-10 Thread ChadDavis
That's easy. Thanks. On Mon, Nov 10, 2008 at 1:12 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Actually, all non-deprecated ctors of IndexWriter set autoCommit to false. > Ie, in 3.0 autoCommit false will become the only option. > > Mike > > > ChadDavis wrote: > > The FAQ's have this

Re: Term numbering and range filtering

2008-11-10 Thread Tim Sturge
Hmmm -- I hadn't thought about that so I took a quick look at the term vector support. What I'm really looking for is a compact but performant representation of a set of filters on the same (one term field). Using term vectors would mean an algorithm similar to: String myfield; String myterm; Te

Re: incremental update of index

2008-11-10 Thread Erick Erickson
It all depends on how many updates you're doing, which you haven't told us . If a large majority of your index is being updated, there's no particular reason to update, I'd build a new one. Best Erick On Mon, Nov 10, 2008 at 3:09 PM, ChadDavis <[EMAIL PROTECTED]>wrote: > That's what I thought.

Re: Term numbering and range filtering

2008-11-10 Thread Paul Elschot
Op Monday 10 November 2008 22:21:20 schreef Tim Sturge: > Hmmm -- I hadn't thought about that so I took a quick look at the > term vector support. > > What I'm really looking for is a compact but performant > representation of a set of filters on the same (one term field). > Using term vectors woul

Re: Term numbering and range filtering

2008-11-10 Thread Tim Sturge
I think we've gone around in a loop here. It's exactly due to the inadequacy of cached filters that I'm considering what I'm doing. Here's the section from my first email that is most illuminating: " The reason I have this question is that I am writing a multi-filter for single term fields. My ind

How to make Lucene search for parts of terms?

2008-11-10 Thread Artur Tomusiak
Hello, We are using a MultiFieldQueryParser and we have problems with making Lucene find parts of words. So that for example searching for "a" will find all the results that contain "a" in it, not only as a separate token, but even inside of the tokens (like word "make"). We tried putting wi

Re: How to make Lucene search for parts of terms?

2008-11-10 Thread Patrick Turcotte
Take a look at the ngram classes (probably in contrib, don't remember for sure right now). Patrick - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Feasibility question

2008-11-10 Thread Jeff Capone
Has anyone deployed Lucene to index log files? I have seen some articles about how RackSpace used Lucene and Hadoop for log processing, but I have not seen any details on the implementation. To get my required analytics, I think I would need to treat each line of the Apache log files as a do

Re: Boosting results

2008-11-10 Thread Erik Hatcher
On Nov 10, 2008, at 2:42 PM, Stefan Trcek wrote: On Monday 10 November 2008 13:55:31 Michael McCandless wrote: Finally, you might want to instead look at Solr, which provides facet counting out of the box, rather than roll your own... Doooh - new api, but it's facet counting sounds good. An

Re: Term numbering and range filtering

2008-11-10 Thread Tim Sturge
Reading this I realize how unclear it is, so let me give a concrete example: I want to do a search restricting users by age range. So someone can ask for the users 18-35, 40-60 etc. Here are the options I considered: 1) construct a RangeQuery. This is a 20-40 clause boolean subquery in an otherw

deprecated method Token class constructor

2008-11-10 Thread 장용석
hi :) first, i'm sorry for my bad English.. I have a question. In lucene 2.4.0 , Token class constructor public Token(String text, int start, int end, int flags) is deprecated. I want to know why and What constructor is the substitution for this deprecated constructor? May I use like this? T

Re: performance/scalability issues re filtering of protected search results

2008-11-10 Thread Michael Wechner
Erick Erickson schrieb: This has been discussed more than a few times, I suggest you take a look at the searchable archive for things like privileges, access privileges, etc. You'll find lots of information faster that way... You mean Erik Hatcher's answer re SecurityFilter http://archives.de