Hi,
I am trying to create a linear function to influence the similarity
computation. For example -
if tf = 4, f(tf) = 150 * 1 + 150 * 0.3
= 195
The first occurrence is multiplied by 150. The next three occurrences are
mulitplied by 150 and
divided by 10 (3/10).
However, w
On 1/2/07, sdeck <[EMAIL PROTECTED]> wrote:
Thanks for advanced on any insight on this one.
I have a fairly large query to run, and it takes roughly 20-40 seconds to
complete the way that i have it.
here is the best example I can give.
I have a set of roughly 25K documents indexed
I have que
Sorry, one more bit of info.
In the index, the contents of the article are stored/indexed. These are
just the guts though, and it is around 1-3K worth of character data.
The current index, as it stands with 33K of documents is about 109 megs.
Again, it seems like I am just missing something some
Mucho thanks. I will look into these.
For more info, I have roughly 3 documents now, and about 350,000 terms
When I do my queries I use the StandardAnalyzer with a whole slew of stop
words.
So, not sure if that might still be messing me up or not.
In the end, I may have to go with the prebuilt
Hi Scott,
sdeck wrote:
> I guess, any ideas why I would run out of heap memory by combining all of
> those boolean queries together and then running the query? What is happening
> in the background that would make that occur? Is it storing something in
> memory, like all of the common terms or som
This probably isn't a lucene error - but I'm hoping that maybe somebody
here has seen it before, and can shed some light.
I'm trying to add a simple document to an empty index. The toString on
the document looks like this:
Document
stored/uncompressed,indexed
stored/uncompressed,indexed
i
Looks like interesting stuff Mark, but why did you make everything so
configurable (syntax-wise)? IMO, there is a lot of value to
standards, and doing things like changing the precedence of operators
isn't necessarily a good thing :-)
I made it so configurable because I needed to implement a
Yes, indeed. I have tried each of those, hence my frustration.
So, the max clause one did not seem to work, I ran out of heap memory for
some reason. I have my heap top set to -Xmx1024m
so, that should be enough.
Tried the query filter (that one also caused a heap memory error)
yeah, I have th
Hi Scott,
sdeck wrote:
> I can't combine each of the movie queries together into one, because I get a
> memory error because of how many clauses there are (setting the clause
> higher did not help)
Have you tried increasing the memory available to the JVM? Sun's JVM
takes an option "-Xmx" to cha
On 12/5/06, Mark Miller <[EMAIL PROTECTED]> wrote:
I have finally delved back into the Lucene Query parser that I started a
few months back.
Looks like interesting stuff Mark, but why did you make everything so
configurable (syntax-wise)? IMO, there is a lot of value to
standards, and doing th
Hi,
We have 500k xml documents in a file. We ran the digest/lucene and get the
following error messages. We had ran a smaller size file with 20k xml
documents without any problems. Can anyone help us to resolve this problem?
Thank you very much.
Regards,
Mark
Jan 3, 2007 10:17:37 AM org.apach
Sure.
Yes, this is a metaphor for what I am actually doing, but movies are a great
example.
So, you go out to each of the news sites and you pull in their entertainment
articles. These could be generic news, generic entertainment, whatever, so
right off the bat you have no way of saying that th
While we're talking movies etc - did anyone else have a stab at the Netflix
prize using Lucene? ( http://www.netflixprize.com/ )
I did get onto the leaderboard (briefly) using a Lucene-based solution which
involved loading all 100 million movie reviews into a single RAMDirectory for
fast proc
Hi Sdeck,
sdeck wrote:
> The query for collecting a specific actor is around 200-300 milliseconds,
> and the movie one, that actually queries each actor, takes roughly 500-700
> milliseconds. Yet, for a genre, where you may have 50-100 movies, it takes
> 500 milliseconds*# of movies
I'm having tr
Term vector allows you to store frequency, position, offset info
(last two are optional) on a per document basis so that you can
retrieve this info for any given document. I have a powerpoint and
some examples of usage on it from my ApacheCon talk in 2005 at http://
www.cnlp.org/apachecon20
This _may_ help: http://lucene.apache.org/java/docs/scoring.html
It has links into the javadocs for creating Custom Query/Scorers, etc.
-Grant
On Jan 2, 2007, at 9:32 PM, escher2k wrote:
I am trying to build a scoring function which is additive across
multiple
fields that are searched.
Fo
Hey Laurent,
I am actually pretty much ready for a beta/preview release right about
now. All of the features are in and I am pretty happy with most of the
work. Over the past month I have been squashing bugs and could certainly
use as much help as I can get making sure this thing is as perfect
Hi,
I've just started with the implementation of Lucene in my Shale-Hibernate
application. From the demo I understand most of the Field constructor:
Field(String name, String value, Field.Store store, Field.Index index,
Field.TermVector termVector)
Except what the Field.TermVector termVector doe
Hi Mark
As said in a previous mail, I'm very interested in your Parser and I'm
happy to hear you made progress , and implemented
Paragraph/Sentence proximity search functionality. :)
This is the killer feature for me!
and if the execution of the resulting query ( a mix containing
SpanQuery 's
19 matches
Mail list logo