Hi all,
For example, if I have a document with two fields text and num like
this:
text:foo bar 1
num:1
When users query "foo", I changed the query to "text:foo AND num:1", both
"foo" and "1" in the text field will be highlighted. I don't wanna the word
"1" in
Hi,
I wonder if anyone here knows if there is a 'smart' text pattern finder,
ideally written in Java. The library I'm looking for should be able to 'guess'
the category of the particular text on the page, most probably by finding
similarities between the bulk of the pages and a set of template
See below...
"Stelios Eliakis" <[EMAIL PROTECTED]> wrote on 25/09/2006 15:48:10:
> You are right!
> 1)As far as Example 1 is concerned, I don't want these 2 fragments to
have
> the same score.Do you know how could I do this?
This behavior is not configurable, as far as I can understand, at least
I’m getting an error while trying to build my index:
Caused by: java.io.IOException: The handle is invalid
at
java.io.RandomAccessFile.close0(Native Method)
at
java.io.RandomAccessFile.close(RandomAccessFile.java:532)
at
org.apache.lucene.store.FSIndexOu
: but searcher.explain returns that idf's and scores were still counted.
: correct me if i am wrong.
1) search.explain doesn't know anything about any sorting, so you can't
trust it to tell you wether or not scores are computed when doing a custom
sort.
2) Lucene does in fact "score" the matches
: In our application we need to do this for all 20 fields. That means
: me have to create twenty redundant fields just for sorting.
: That's really an overhead in size and indexing-time.
I guess it just depends on the size of your index and how fast is "fast
enough" when indexing ... most people
I encountered this issue before and was led to use SpanQueries for
wildcards within phrases. Take a look at the SpanQuery family of
classes. SpanQueries can give you the ability to specify a wildcarded
term within a phrase since you can nest different SpanQueries within a
SpanQuery. One of these is
You are right!
1)As far as Example 1 is concerned, I don't want these 2 fragments to have
the same score.Do you know how could I do this?
2)Furthemore, if a try to take fragment score:
Scorer fragmentScore= highlighter.getFragmentScorer();
float fragmentScoreFloat=fragmentScore.getFragmentScore(
"Stelios Eliakis" <[EMAIL PROTECTED]> wrote on 23/09/2006 02:39:27:
> I want to extract the Best Fragment (passage) from a text file.
> When I use the following code I take the first fragment that contains my
> query. Nevertheless, the JavaDoc says that the function getBestFragment
> returns the b
You were probably right. See below
On 9/25/06, Paul Lynch <[EMAIL PROTECTED]> wrote:
Thanks for the quick response Erick.
"index the documents in your preferred list with a
field and index your non-preferred docs with a field
subid?"
I considered this approach and dismissed it due to the
On Monday 25 September 2006 22:20, Dan Armbrust wrote:
> My hunch is that its not real easy, otherwise
> it would already have been done...
I think it shouldn't be difficult, but to expand the PrefixQuery, your
QueryParser would need an IndexReader. Currently IndexReader/QueryParser
don't depen
Thanks for the quick response Erick.
"index the documents in your preferred list with a
field and index your non-preferred docs with a field
subid?"
I considered this approach and dismissed it due to the
actual list of preferred ids changing so frequently
(every 10 mins...ish) but maybe I was a
I have someone wanting to do a query like this - "top sta*", but from
what I have been able to gather, lucene doesn't have any built in
support for wildcards inside of phrases?
Well, at least not complete support. I was led to the MultiPhraseQuery
class - but looking at that leaves me wonderi
OK, a really "off the top of my head" response, but what the heck
I'm not sure you need to worry about filters. Would it work for you to index
the documents in your preferred list with a field (called, at the limit of
my creativity, preferredsubid ) and index your non-preferred docs with a
f
Hi All,
I have an index containing documents which all have a
field called SubId which holds the ID of the
Subscriber that submitted the data. This field is
STORED and UN_TOKENIZED
When I am querying the index, the user can cloose a
number of different ways to sort the Hits. The problem
is that I
I think you could use a range (either a RangeQuery or RangeFilter). So your
range (from your example) would be between 125688 and 10 or some
such...
Be careful of a RangeQuery throwing TooManyClauses if your range contains
more than 1024 (default) distinct entries, which means I'd recomm
I have a filtering process that checks my index for various things. I
have an "itemid" field in this index and I keep track of the last itemid
I search up to. I was wondering if there was an equivalent to doing a
search with a "greater than" clause? Sort of like:
to:[EMAIL PROTECTED] AND su
Hello, Aviran.
but searcher.explain returns that idf's and scores were still counted.
correct me if i am wrong.
MAENN> AFAIK when you sort Lucene does not calculate the relevance score.
MAENN> Aviran
MAENN> http://www.aviransplace.com
MAENN> -Original Message-
MAENN> From: Yura Smolsky
AFAIK when you sort Lucene does not calculate the relevance score.
Aviran
http://www.aviransplace.com
-Original Message-
From: Yura Smolsky [mailto:[EMAIL PROTECTED]
Sent: Monday, September 25, 2006 4:39 AM
To: java-user@lucene.apache.org
Subject: how to enhance speed of sorted search
One option is to use a SpanQuery (instead of a PhraseQuery if that is
what you're currently using) and toy with the getSpans() for getting
occurrences of matches.
Erik
On Sep 25, 2006, at 8:55 AM, Virlouvet Olivier wrote:
Hi All,
I use proximity operator in my queries (for e
Hi All,
I use proximity operator in my queries (for example "w1 w2 WITHIN 4" to
find all documents where the distance between w1 and w2 and less or equals to
4)
For all documents matching the query, I need to know the corresponding
occurences (w1, w2) to highlight them (and so ex
Finding the connected elements which make up the neighbourhood is just
straightforward lookups of connected IDs on the graph. This can be done using
either a database or Lucene - your choice, although I suspect the database is
the better choice given the structured nature of the data and any pot
Hi,
I was getting very bad results for some queries & after a little research &
a lot of searching on the mailing list, I have the following information.
Could someone please help me figure out whats wrong.
Exact phrase matches in fields with higher boost were ranking lower than
some non exact m
I am using lucene for simple flat searches. Now I have a requirement to
do searches based on the object's connectivity with other objects. The
way the searches are done in "social networks". Lets say I want to
search for a query in only those objects which are within 3 degrees of
connectivity t
Hello, java-user.
I have a set of documents with two fields:
1. "summary" which is tokenized, stored. it contains some text
2. "date", which is untokenized, stored. it contains seconds from
epoch aligned to the right and padded with zeroes on the left
I perform searches which are sorted by "date"
Hi Chris,
sure, you can create an addional field for every field that should
support sorting.
In our application we need to do this for all 20 fields. That means
me have to create twenty redundant fields just for sorting.
That's really an overhead in size and indexing-time.
:: using the stored
26 matches
Mail list logo