Hi,
Does anyone have a modified scoring (Similarity) function they would
care to share?
I'm searching web page documents and find the default Similarity seems
to assign too much weight to documents with frequent occurrence of a
single term from the query and not enough weight to documents that
co
spanquery/ for
> good info.
>
> http://lucene.apache.org/java/3_3_0/queryparsersyntax.html tells you
> how to use boosting if you are using the query parser.
>
>
> --
> Ian.
>
> On Tue, Sep 13, 2011 at 2:26 PM, Joel Halbert wrote:
> > Hi Folks,
> >
>
Hi Folks,
What is the simplest method of constructing a multi term query such that
the highest scoring document(s) is always that which contain all terms
in the query adjacent to each other?
i.e. if I search for "federal reserve" I would prefer documents that
contain "Ben Bernake is the chairman
Joel
On Fri, 2011-05-27 at 13:56 +0200, Pierre GOSSE wrote:
> Hi,
>
> Maybe is it related to :
> https://issues.apache.org/jira/browse/LUCENE-3087
>
> Pierre
>
> -Message d'origine-
> De : Joel Halbert [mailto:j...@su3analytics.com]
> Envoyé : vendr
Hi,
I'm using Lucene 3.0.3. I'm extracting snippets using
FastVectorHighlighter, for some snippets (I think always when searching
for exact matches, quoted) the fragment is null.
Code looks like:
query = QueryParser.escape(query);
if (exact) {
Thanks Koji, I didn't think it was possible as it stands.
On Mon, 2011-03-07 at 21:38 +0900, Koji Sekiguchi wrote:
> (11/03/07 1:16), Joel Halbert wrote:
> > Hi,
> >
> > I'm using FastVectorHighlighter for highlighting, 3.0.3.
> >
> > At the moment t
Hi,
I'm using FastVectorHighlighter for highlighting, 3.0.3.
At the moment this is highlighting a field which is stored, but not
compressed. It all works perfectly.
I'd like to compress the field that is being highlighted, but it seems
like the new way to compress a stored field is to apply it a
age-----
From: Joel Halbert
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: scoring adjacent terms without proximity search
Date: Sat, 31 Oct 2009 08:38:29 +
Thank you all for your suggestions, I shall have a little think about
the best way forward, and report
Thank you all for your suggestions, I shall have a little think about
the best way forward, and report back if I do anything interesting that
works well.
In answer to Grant's question, why not use PhraseQuery, we do not want
to have an artificial upper limit on the slop, i.e. we do want to
includ
Hi,
Without using a proximity search i.e. "cheese sandwich"~5
What's the best way of up-scoring results in which the search terms are
closer to each other?
E.g. so if I search for:
content:cheese content:sandwich
How do you ensure that a document with content:
"Toasted Cheese Sandwich"
scores
I suppose this could be summarised as:
"how do i set the score of each document result to be the score of that
of the field that best matches the search terms"?
-Original Message-----
From: Joel Halbert
Reply-To: java-user@lucene.apache.org
To: Lucene Users
Subject: similarit
Hi,
Given a query with multiple terms, e.g. fish oil, and searching across
multiple fields e.g.
query= fieldA:fish fieldA:oil fieldB:fish fieldB:oil etc...
I don't want to give any more weight to documents that match the same
word multiple times (either in the same, or different fields). I am
Hi Peng - they are both within the contrib dir in your lucene package dowload
e.g
lucene-2.4.0/contrib/highlighter/*.jar
lucene-2.4.0/contrib/analyzers/*.jar
- Original Message -
From: "Peng Yu"
To: java-user@lucene.apache.org
Sent: Saturday, 26 September, 2009 12:11:02 GMT +00:00 GMT B
e-java/PoweredBy>Best
Erick
On Thu, Sep 24, 2009 at 11:17 AM, Joel Halbert wrote:
> Hi,
>
> Does anyone know of any recent metrics & stats on building out an index
> of ~100mm documents (each doc approx 5k). I'm looking for approx stats
> on time to build, time to
Hi,
Does anyone know of any recent metrics & stats on building out an index
of ~100mm documents (each doc approx 5k). I'm looking for approx stats
on time to build, time to query and infrastructure requirements (number
of machines & spec) to reasonably support an index of such a size.
Thanks,
J
Hi,
When using Lucene I always consider two approaches to displaying search
result data to users:
1. Store any fields that we index and display to users in the Lucene
Documents themselves. When we perform a search simply retrieve the data
to be displayed from the Lucence documents themselves.
or
me both the servers have uptodate indexes. I was thinking what
> could be the best architecture/design strategy to do so given the fact that
> any of the 2 application servers could be serving search request depending
> upon its availability.
>
> Any inputs please?
>
> Thanks for
-
> >>
> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> > -
Hi Rich - from what time?
-Original Message-
From: Richard Marr
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: London Open Source Search meetup - Mon 15th June
Date: Fri, 12 Jun 2009 12:54:30 +0100
Hi all,
Just a quick reminder that this is happening
ave a good idea to get the distributions less than some
reasonable time?
On 2009. 05. 26, at 오후 8:15, Joel Halbert wrote:
> Yes, something like this might work, although rather than having a
> cutoff determined by the difference between two successive document
> scores (Doc(n) and D
e thing to check is that the scores are indeed sorted in descending
> order to begin with. For example, I don't think the hits in
> TopDocCollector and its brethren are strictly ordered this way (no?).
>
> -Babak
>
> On Mon, May 18, 2009 at 6:52 AM, Joel Halbert wrote:
&
TrieRangeQuery - thanks for the tip.
-Original Message-
From: Michael McCandless
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Does Lucene fail fast on boolean queries?
Date: Thu, 21 May 2009 11:39:23 -0400
On Thu, May 21, 2009 at 10:58 AM, Joel
try http://piccolo.sourceforge.net/
is small and fast.
-Original Message-
From: Michael Barbarelli
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
Date: Thu, 21 May 2009 15:52:00 +0100
Why not use an XML pull parser? I recommen
uot;doc=5" can be asked for by Lucene.
Also note that this is an internal implementation detail -- Lucene
could easily change to do batch processing of AND'd queries in which
case docs 5,10 could easily be iterated on. So I wouldn't "rely" on
this in your app.
Mike
On Thu, M
: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Does Lucene fail fast on boolean queries?
Date: Thu, 21 May 2009 10:29:57 -0400
Yes.
As soon as Lucene sees that the Name docID iteration has ended, the
search will break.
Mike
On Thu, May 21, 2009 at 8:44 AM, Joel Halbert
dles non-eng and eng in
equally good ways? Or any other ideas on the same ?
Thanks,
KK.
On Thu, May 21, 2009 at 6:18 PM, Joel Halbert wrote:
> The highlighter should be language independent. So long as you are
> consistent with your use of Analyzer between
> indexing/query/highlighting
The highlighter should be language independent. So long as you are
consistent with your use of Analyzer between
indexing/query/highlighting.
As for the most appropriate Analyzer to use for your local language,
this is a seperate question - especially if you are using stop word and
stemming filters
Hi,
When Lucene performs a Boolean query, say:
Field Name = Male
AND
Field Age = 30
assuming the resultant docs for each portion of the query were:
Matching docs for: Name = 1,2
Matching docs for: Age = 1,2,5,10
Will Lucene stop searching for documents matching the Age term once it
has found
Hi,
Looking at the docs for the 2.4 codebase, for RangeQuery
http://lucene.apache.org/java/2_4_0/api/index.html?org/apache/lucene/search/RangeQuery.html
there is a comment that a TooManyClauses exception is no longer thrown.
Does this mean that it is now safe to use RangeQuery without worrying
a
"but in some cases the search returns too many results"
do you *really* mean you get "too many results"? or do you actually mean
you get a "too many terms" exception due to the query expansion?
-Original Message-
From: Huntsman84
Reply-To: java-user@lucene.apache.org
To: java-user@lucen
ce function for scores
Date: Mon, 18 May 2009 09:50:10 -0400
In that case, I'll have to defer to folks who actually know somethingabout
that part of the code .
Erick
On Mon, May 18, 2009 at 9:25 AM, Joel Halbert wrote:
> Hi Erick,
>
> Thanks for the pointer. Sorry if the q
ou can examine the scores and put them in buckets any
way you want, all you're doing is spinning through a small data
structure performing some calculations.
HTH
Erick
On Mon, May 18, 2009 at 8:52 AM, Joel Halbert wrote:
> Hi,
>
> I'd like to apply a score filter. I realise
Hi,
I'd like to apply a score filter. I realise that filtering by absolute
(i.e. anything less than x) scores is pretty meaningless.
In my case I want to filter based on relative score - or on some
function of score which looks for clustering of documents around certain
score values.
Context: I
You can use your Analyzer to get a token stream from any text you give
it, just like Lucene does.
Something like:
String text = "your list of words to analyze and tokenize";
TokenStream ts = YOUR_ANALYZER.tokenStream(null, new
StringReader(text));
Token token = new Token();
while((ts.next(tok
/IndexWriter.
MaxFieldLength.html
And the corresponding IndexWriter ctors.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Joel Halbert [mailto:j...@su3analytics.com]
> Sent: Wednesday, May 13, 200
Is there a limit to the size of a field which Lucene will index?
i.e. for very large field values are only the first n tokens or n
characters indexed?
If so is there a way of upping/removing this limit?
Rgs,
Joel
-
To unsubs
Hi,
By way of clarification, when a filter is used with a search query, is
the filter applied only to documents that matched the search query or is
it applied to all documents in the index before the query is executed?
Rgs,
Joel
Out of interest, if the index is entirely in memory (using a RAMDir) is
there any significant different in performance between options (a) and
(b) as outlined below?
Rgs,
Joel
-Original Message-
From: Ganesh
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org, rolaren..
Hi,
I have a RAMDirectory based index. The document source for the index is
a database table, where content to be indexed is stored alongside a
status (pending_index, indexed, pending_delete, deleted). Each time the
application is started, and periodically thereafter, all documents from
the databa
Presumably there is no score ordering to the hit id's lucene delivers to
a HitCollector? i.e. they are delivered in the order they are found and
score is neither ascending or descending i.e. the next score could be
higher or lower that the previous one?
-Original Message-
From: Mark Miller
When constructing a query, using a series of terms e.g.
Term1=X, Term2=Y etc...
does it make sense, like in sql, to place to most restrictive term query
first?
i.e. if I know that the query will be mainly constrained by the value of
Term1, does having this as the first in the query make the exec
Hi,
Is there any practical limit on the number of fields that can be
maintained on an index?
My index looks something like this, 1 million documents. For each group
of 1000 documents I might have 10 indexed fields. This would mean in
total about 1 fields. Am I going to run into any issues her
Hi,
I'm looking for an optimal solution for extracting unique field values.
The rub is that I want to be able to perform this for a unique subset of
documents...as per the example:
I have an index with Field1 and Field2.
I want "all unique values of Field1 where Field2=X".
Other than actually p
43 matches
Mail list logo