Thanks gfor the details explanation. But as I understand this query will
still match only documents that contains both terms (either in the same
field or in different). What if there's a document that contains only
"hello"? This query will not find it, am I right? But what we want to
achieve is thi
On Thu, Apr 26, 2012 at 5:13 AM, Yang wrote:
>
> I read the paper by Doug "Space optimizations for total ranking",
>
> since it was written a long time ago, I wonder what algorithms lucene uses
> (regarding postings list traversal and score calculation, ranking)
>
>
> particularly the total rankin
+(title:hello title:world desc:hello desc:world)
(+title:hello +title:world)^100
(+desc:hello +desc:world)^50
(+title:hello +desc:world)^10
(+desc:hello +title:world)^10
the boost values(100,50,10,10) should be carefully adjusted.
if tf of a document is very large, 10 may be not enough.
you can mo
Hi,
I'm storing a field two times, one analyzed and other non-analyzed, in
order to be able to query for terms and for exact keyword:
// Analyzed version
d.add(new Field(key, value, Store.NO, Index.ANALYZED,
T
Hello,
I am relatively new to Lucene, this might be a noob question, if so please
redirect me. I'd like some guidance on how to use Lucene to address a problem.
I have a set of a few hundred (and growing) user-defined keywords such as
"spain" and "volkswagen" and each of which is associated to
Hmmm, putting analyzed and unanalyzed values in
the same field seems like it'd be difficult to get right. In
the Solr world, two separate fields are usually used.
Sorting is right out, the results are unpredictable. What does
it mean to sort on a field with multiple tokens? For a doc
with "aardva
Why don't you store keywords related data in keywords field which can be
analyzed and other field in as it is now.
So all fields for which keywords is needed, move it to keywords section
-v
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, April 27,
I cannot do that, I need to query for specific fields, both for the
whole value in a term (keyword) and for fuzzy/phrase...
For the sorting I will probably take Erick Ericksson's suggestion -
use a separate non-analyzed field for sorting. Makes sense.
The other problem (querying both by whole key
> This appears to be somewhat the reverse of the typical
> Lucene use case -- rather than having a set of say 1000 of
> articles which are indexed, then issuing a query using a few
> keywords to search on those articles, I have a set of say
> 1000 keywords, and a single article, and I want to deter
Hi guys,
I have a field, Anayzed, Store.No.
Suppose one Document with value inside this field "Hello".
Another one "Hello world , one, two, three, four".
Since the field is Analyzed (with norms), the "one two three four) will
definitely affect the resulting rating in case we search for "Hello wor
You can override org.apache.lucene.search.Similarity/DefaultSimilarity
to tweak quite a lot of stuff.
computeNorm() may be the method you are interested in. Called at
indexing time so be sure to use the same implementation at index and
query time, using IndexWriterConfig.setSimilarity() and
Index
Thanks Ralf.
basically you are talking about selectivity of columns in a JOIN, right?
but in my above example, "yellow dog", both terms are very common, and both
have long postings lists.
Yang
On Thu, Apr 26, 2012 at 12:17 AM, Ralf Heyde wrote:
> Hi,
>
> i dont know the correct implementati
yes, that's why many search engines will not allow user visit page
> number greater than a threshold. for most application, users usually
> only visit top results. That's why ranking algorithm is important. if
> you found your users always turn to next page, I think you should
> consider your appli
This is my program to calculate TF-IDF value for a document in a collection
of documents. This is working fine, but takes lot of time when calculating
the "IDF" values (finding the no of documents which contains particular
term).
Is there a more efficient way of finding the no of documents which c
I'm using Lucene's Term Freq vector to calculate cosine similarity between
documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene
takes this as 3 separate terms, but 3 of them means same "owe". Is there
any functionality in Lucene that can be used to index by semantics? so that
stemmer
semantic is a "large" word, care to use it.
On Sat, Apr 28, 2012 at 11:02 AM, Kasun Perera wrote:
> I'm using Lucene's Term Freq vector to calculate cosine similarity between
> documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene
> takes this as 3 separate terms, but
16 matches
Mail list logo