I'm not sure I understand what your field arrangement would be when you say
"[T]he items I'm pulling in from the web contain large bodies of text
(descriptions) whereas the products in my catalog consist of shorter fields
such as product name, manufacturer, product code, etc. So using the smaller
After reading all about the renaming of optimize() and updating my Lucene
libraries to 3.4, I was surprised and confused by what I found.
I have a 1 segment index (all files are named _1*.*) that had been created
with 3.0.1 code which had been optimized many times (all 3.0.1 code). The
first
Thanks for the reply,
> > The first time my code used the 3.4 libraries with version level set
> > to 3.4 and it tried
> > to optimize() (still using this now deprecated old call), the new code
> went wild!
> > It took up more memory than the heap was limited to, so I believe it
> > is taking
> >
In Lucene, 3.4 I recently implemented "Translating PhraseQuery to
SpanNearQuery" (see Lucene in Action, page 220) because I wanted _order_ to
matter.
Here is my exact code called from getFieldsQuery once I know I'm looking at a
PhraseQuery, but I think it is exactly from the book.
static Q
> -Original Message-
> short of it: i want "queen bohemian rhapsody" to return that song named
> "Bohemian Rhapsody" by
> the artist named "Queen", rather than songs with titles like "Bohemian
> Rhapsody (Queen Cover)".
Have you looked in MultiFieldQueryParser and its use of extra boosts
Thanks for the discussion, I really appreciate you pointing out that the
> Code here ignores PhraseQuery (PQ) 's positions:
And by "here" you mean my original code not your suggestion.
> To accommodate for this, the overall extra gap can be added to the slope:
> int gap = (pp[pp.length] -
>Doron wrote:
> > int gap = (pp[pp.length] - pp[0]) - (pp.length - 1);
int gap = (pp[pp.length-1] - pp[0]) - (pp.length - 1);
Don't want to cause an IndexOutOfBoundsException
-Paul
-
To unsubscribe, e-mail: java-user-unsub
My Index does NOT have a simple UID, it uses the file PATH to the file as the
unique key.
I was implementing a CustomScoreQuery which not only tweaked the score it also
wanted to write down which documents had passed through this part of overall
rebuilt query, so that I could further mess with t
hat
> you needed for the
> subsequent messing around.
>
>
> --
> Ian.
>
>
> On Sat, Feb 4, 2012 at 12:09 AM, Paul Allan Hill wrote:
> > My Index does NOT have a simple UID, it uses the file PATH to the file as
> > the unique key.
> > I was implemen
What the heck does is the JavaDoc for DisjunctionMaxQuery saying:
"A query that generates the union of documents produced by its subqueries, and
that scores each document with the maximum score for that document as produced
by any subquery, plus a tie breaking increment for any additional matchi
> -Original Message-
> From: Paul Allan Hill [mailto:p...@metajure.com]
> Sent: Wednesday, February 08, 2012 2:42 PM
> To: java-user@lucene.apache.org
> Subject: Please explain DisjunctionMaxQuery JavaDoc.
>
> What the heck does is the JavaDoc for Disju
I was looking to the possibility that _some_ subqueries might discount
(actually remove) field norms. I'm trying out the view that in general while
looking for terms norm values seem appropriate, but when searching for phrases
that my custom query parsing has added to the query, the document bo
I'd love to hear what you find out. I have been working with this also.
I only changed the sweet spot to a slightly larger range than the one in the
original paper (but kept the same steepness) and I tweaked the sloppy freq to
not score multiple occurances of a phrase as strong as the they are i
> -Original Message-
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> As for what hyperbolicTf is trying to do ... it creates a hyperbolic function
> letting you specify a hard max
> no matter how many terms there are.
A picture -- or more precisely a graph -- would be worth a
As it says in the title, we are moving from 3.0.2 from to 3.4. I am interested
in issues about the need to build a new index or just keep changing the current
one. My company has been busy building software and have not upgraded the
Lucene and Tika libraries since last year, but I'm trying to
> What is the best format/markup/ebook standard/document standard/other to use
> for easiest and best text search support?
The helpful Tika libraries can parse any number of formats and then index the
text into Lucene, so I'm thinking the question is what is the better format
when you want to d
16 matches
Mail list logo