[Lucene2.0]How to not highlight keywords in some fields?

2006-09-25 Thread zhu jiang
Hi all, For example, if I have a document with two fields text and num like this: text:foo bar 1 num:1 When users query "foo", I changed the query to "text:foo AND num:1", both "foo" and "1" in the text field will be highlighted. I don't wanna the word "1" in

does anyone know of a 'smart' categorizing text pattern finder?

2006-09-25 Thread Vladimir Olenin
Hi, I wonder if anyone here knows if there is a 'smart' text pattern finder, ideally written in Java. The library I'm looking for should be able to 'guess' the category of the particular text on the page, most probably by finding similarities between the bulk of the pages and a set of template

Re: highlighting

2006-09-25 Thread Doron Cohen
See below... "Stelios Eliakis" <[EMAIL PROTECTED]> wrote on 25/09/2006 15:48:10: > You are right! > 1)As far as Example 1 is concerned, I don't want these 2 fragments to have > the same score.Do you know how could I do this? This behavior is not configurable, as far as I can understand, at least

Caused by: java.io.IOException: The handle is invalid

2006-09-25 Thread Van Nguyen
I’m getting an error while trying to build my index:   Caused by: java.io.IOException: The handle is invalid   at java.io.RandomAccessFile.close0(Native Method)   at java.io.RandomAccessFile.close(RandomAccessFile.java:532)   at org.apache.lucene.store.FSIndexOu

Re[2]: how to enhance speed of sorted search

2006-09-25 Thread Chris Hostetter
: but searcher.explain returns that idf's and scores were still counted. : correct me if i am wrong. 1) search.explain doesn't know anything about any sorting, so you can't trust it to tell you wether or not scores are computed when doing a custom sort. 2) Lucene does in fact "score" the matches

Re: Status: Sorting on tokenized fields

2006-09-25 Thread Chris Hostetter
: In our application we need to do this for all 20 fields. That means : me have to create twenty redundant fields just for sorting. : That's really an overhead in size and indexing-time. I guess it just depends on the size of your index and how fast is "fast enough" when indexing ... most people

RE: wildcards in quoted phrases?

2006-09-25 Thread Lee_Gary
I encountered this issue before and was led to use SpanQueries for wildcards within phrases. Take a look at the SpanQuery family of classes. SpanQueries can give you the ability to specify a wildcarded term within a phrase since you can nest different SpanQueries within a SpanQuery. One of these is

Re: highlighting

2006-09-25 Thread Stelios Eliakis
You are right! 1)As far as Example 1 is concerned, I don't want these 2 fragments to have the same score.Do you know how could I do this? 2)Furthemore, if a try to take fragment score: Scorer fragmentScore= highlighter.getFragmentScorer(); float fragmentScoreFloat=fragmentScore.getFragmentScore(

Re: highlighting

2006-09-25 Thread Doron Cohen
"Stelios Eliakis" <[EMAIL PROTECTED]> wrote on 23/09/2006 02:39:27: > I want to extract the Best Fragment (passage) from a text file. > When I use the following code I take the first fragment that contains my > query. Nevertheless, the JavaDoc says that the function getBestFragment > returns the b

Re: Advice on Custom Sorting

2006-09-25 Thread Erick Erickson
You were probably right. See below On 9/25/06, Paul Lynch <[EMAIL PROTECTED]> wrote: Thanks for the quick response Erick. "index the documents in your preferred list with a field and index your non-preferred docs with a field subid?" I considered this approach and dismissed it due to the

Re: wildcards in quoted phrases?

2006-09-25 Thread Daniel Naber
On Monday 25 September 2006 22:20, Dan Armbrust wrote: > My hunch is that its not real easy, otherwise > it would already have been done... I think it shouldn't be difficult, but to expand the PrefixQuery, your QueryParser would need an IndexReader. Currently IndexReader/QueryParser don't depen

Re: Advice on Custom Sorting

2006-09-25 Thread Paul Lynch
Thanks for the quick response Erick. "index the documents in your preferred list with a field and index your non-preferred docs with a field subid?" I considered this approach and dismissed it due to the actual list of preferred ids changing so frequently (every 10 mins...ish) but maybe I was a

wildcards in quoted phrases?

2006-09-25 Thread Dan Armbrust
I have someone wanting to do a query like this - "top sta*", but from what I have been able to gather, lucene doesn't have any built in support for wildcards inside of phrases? Well, at least not complete support. I was led to the MultiPhraseQuery class - but looking at that leaves me wonderi

Re: Advice on Custom Sorting

2006-09-25 Thread Erick Erickson
OK, a really "off the top of my head" response, but what the heck I'm not sure you need to worry about filters. Would it work for you to index the documents in your preferred list with a field (called, at the limit of my creativity, preferredsubid ) and index your non-preferred docs with a f

Advice on Custom Sorting

2006-09-25 Thread Paul Lynch
Hi All, I have an index containing documents which all have a field called SubId which holds the ID of the Subscriber that submitted the data. This field is STORED and UN_TOKENIZED When I am querying the index, the user can cloose a number of different ways to sort the Hits. The problem is that I

Re: "Greater than" equivalent?

2006-09-25 Thread Erick Erickson
I think you could use a range (either a RangeQuery or RangeFilter). So your range (from your example) would be between 125688 and 10 or some such... Be careful of a RangeQuery throwing TooManyClauses if your range contains more than 1024 (default) distinct entries, which means I'd recomm

"Greater than" equivalent?

2006-09-25 Thread Michael J. Prichard
I have a filtering process that checks my index for various things. I have an "itemid" field in this index and I keep track of the last itemid I search up to. I was wondering if there was an equivalent to doing a search with a "greater than" clause? Sort of like: to:[EMAIL PROTECTED] AND su

Re[2]: how to enhance speed of sorted search

2006-09-25 Thread Yura Smolsky
Hello, Aviran. but searcher.explain returns that idf's and scores were still counted. correct me if i am wrong. MAENN> AFAIK when you sort Lucene does not calculate the relevance score. MAENN> Aviran MAENN> http://www.aviransplace.com MAENN> -Original Message- MAENN> From: Yura Smolsky

RE: how to enhance speed of sorted search

2006-09-25 Thread Mordo, Aviran (EXP N-NANNATEK)
AFAIK when you sort Lucene does not calculate the relevance score. Aviran http://www.aviransplace.com -Original Message- From: Yura Smolsky [mailto:[EMAIL PROTECTED] Sent: Monday, September 25, 2006 4:39 AM To: java-user@lucene.apache.org Subject: how to enhance speed of sorted search

Re: Highlight specific occurence

2006-09-25 Thread Erik Hatcher
One option is to use a SpanQuery (instead of a PhraseQuery if that is what you're currently using) and toy with the getSpans() for getting occurrences of matches. Erik On Sep 25, 2006, at 8:55 AM, Virlouvet Olivier wrote: Hi All, I use proximity operator in my queries (for e

Highlight specific occurence

2006-09-25 Thread Virlouvet Olivier
Hi All, I use proximity operator in my queries (for example "w1 w2 WITHIN 4" to find all documents where the distance between w1 and w2 and less or equals to 4) For all documents matching the query, I need to know the corresponding occurences (w1, w2) to highlight them (and so ex

Re: searching in social networks

2006-09-25 Thread mark harwood
Finding the connected elements which make up the neighbourhood is just straightforward lookups of connected IDs on the graph. This can be done using either a database or Lucene - your choice, although I suspect the database is the better choice given the structured nature of the data and any pot

Very high fieldNorm for a field resulting in bad results

2006-09-25 Thread Mek
Hi, I was getting very bad results for some queries & after a little research & a lot of searching on the mailing list, I have the following information. Could someone please help me figure out whats wrong. Exact phrase matches in fields with higher boost were ranking lower than some non exact m

searching in social networks

2006-09-25 Thread Sharad Agarwal
I am using lucene for simple flat searches. Now I have a requirement to do searches based on the object's connectivity with other objects. The way the searches are done in "social networks". Lets say I want to search for a query in only those objects which are within 3 degrees of connectivity t

how to enhance speed of sorted search

2006-09-25 Thread Yura Smolsky
Hello, java-user. I have a set of documents with two fields: 1. "summary" which is tokenized, stored. it contains some text 2. "date", which is untokenized, stored. it contains seconds from epoch aligned to the right and padded with zeroes on the left I perform searches which are sorted by "date"

Re: Status: Sorting on tokenized fields

2006-09-25 Thread lude
Hi Chris, sure, you can create an addional field for every field that should support sorting. In our application we need to do this for all 20 fields. That means me have to create twenty redundant fields just for sorting. That's really an overhead in size and indexing-time. :: using the stored