Similarity modification...

2007-01-03 Thread escher2k
Hi, I am trying to create a linear function to influence the similarity computation. For example - if tf = 4, f(tf) = 150 * 1 + 150 * 0.3 = 195 The first occurrence is multiplied by 150. The next three occurrences are mulitplied by 150 and divided by 10 (3/10). However, w

Re: Speed of grouped queries

2007-01-03 Thread Find Me
On 1/2/07, sdeck <[EMAIL PROTECTED]> wrote: Thanks for advanced on any insight on this one. I have a fairly large query to run, and it takes roughly 20-40 seconds to complete the way that i have it. here is the best example I can give. I have a set of roughly 25K documents indexed I have que

Re: Speed of grouped queries

2007-01-03 Thread sdeck
Sorry, one more bit of info. In the index, the contents of the article are stored/indexed. These are just the guts though, and it is around 1-3K worth of character data. The current index, as it stands with 33K of documents is about 109 megs. Again, it seems like I am just missing something some

Re: Speed of grouped queries

2007-01-03 Thread sdeck
Mucho thanks. I will look into these. For more info, I have roughly 3 documents now, and about 350,000 terms When I do my queries I use the StandardAnalyzer with a whole slew of stop words. So, not sure if that might still be messing me up or not. In the end, I may have to go with the prebuilt

Re: Speed of grouped queries

2007-01-03 Thread Steven Rowe
Hi Scott, sdeck wrote: > I guess, any ideas why I would run out of heap memory by combining all of > those boolean queries together and then running the query? What is happening > in the background that would make that occur? Is it storing something in > memory, like all of the common terms or som

obscure error...

2007-01-03 Thread Dan Armbrust
This probably isn't a lucene error - but I'm hoping that maybe somebody here has seen it before, and can shed some light. I'm trying to add a simple document to an empty index. The toString on the document looks like this: Document stored/uncompressed,indexed stored/uncompressed,indexed i

Re: New Lucene QueryParser

2007-01-03 Thread Mark Miller
Looks like interesting stuff Mark, but why did you make everything so configurable (syntax-wise)? IMO, there is a lot of value to standards, and doing things like changing the precedence of operators isn't necessarily a good thing :-) I made it so configurable because I needed to implement a

Re: Speed of grouped queries

2007-01-03 Thread sdeck
Yes, indeed. I have tried each of those, hence my frustration. So, the max clause one did not seem to work, I ran out of heap memory for some reason. I have my heap top set to -Xmx1024m so, that should be enough. Tried the query filter (that one also caused a heap memory error) yeah, I have th

Re: Speed of grouped queries

2007-01-03 Thread Steven Rowe
Hi Scott, sdeck wrote: > I can't combine each of the movie queries together into one, because I get a > memory error because of how many clauses there are (setting the clause > higher did not help) Have you tried increasing the memory available to the JVM? Sun's JVM takes an option "-Xmx" to cha

Re: New Lucene QueryParser

2007-01-03 Thread Yonik Seeley
On 12/5/06, Mark Miller <[EMAIL PROTECTED]> wrote: I have finally delved back into the Lucene Query parser that I started a few months back. Looks like interesting stuff Mark, but why did you make everything so configurable (syntax-wise)? IMO, there is a lot of value to standards, and doing th

digester/lucene runtime problems

2007-01-03 Thread Mark Mei
Hi, We have 500k xml documents in a file. We ran the digest/lucene and get the following error messages. We had ran a smaller size file with 20k xml documents without any problems. Can anyone help us to resolve this problem? Thank you very much. Regards, Mark Jan 3, 2007 10:17:37 AM org.apach

Re: Speed of grouped queries

2007-01-03 Thread sdeck
Sure. Yes, this is a metaphor for what I am actually doing, but movies are a great example. So, you go out to each of the news sites and you pull in their entertainment articles. These could be generic news, generic entertainment, whatever, so right off the bat you have no way of saying that th

Re: Speed of grouped queries

2007-01-03 Thread mark harwood
While we're talking movies etc - did anyone else have a stab at the Netflix prize using Lucene? ( http://www.netflixprize.com/ ) I did get onto the leaderboard (briefly) using a Lucene-based solution which involved loading all 100 million movie reviews into a single RAMDirectory for fast proc

Re: Speed of grouped queries

2007-01-03 Thread Steven Rowe
Hi Sdeck, sdeck wrote: > The query for collecting a specific actor is around 200-300 milliseconds, > and the movie one, that actually queries each actor, takes roughly 500-700 > milliseconds. Yet, for a genre, where you may have 50-100 movies, it takes > 500 milliseconds*# of movies I'm having tr

Re: Field.TermVector usage

2007-01-03 Thread Grant Ingersoll
Term vector allows you to store frequency, position, offset info (last two are optional) on a per document basis so that you can retrieve this info for any given document. I have a powerpoint and some examples of usage on it from my ApacheCon talk in 2005 at http:// www.cnlp.org/apachecon20

Re: Customize scoring for additive effect...

2007-01-03 Thread Grant Ingersoll
This _may_ help: http://lucene.apache.org/java/docs/scoring.html It has links into the javadocs for creating Custom Query/Scorers, etc. -Grant On Jan 2, 2007, at 9:32 PM, escher2k wrote: I am trying to build a scoring function which is additive across multiple fields that are searched. Fo

Re: New Lucene QueryParser

2007-01-03 Thread Mark Miller
Hey Laurent, I am actually pretty much ready for a beta/preview release right about now. All of the features are in and I am pretty happy with most of the work. Over the past month I have been squashing bugs and could certainly use as much help as I can get making sure this thing is as perfect

Field.TermVector usage

2007-01-03 Thread Joost Schouten
Hi, I've just started with the implementation of Lucene in my Shale-Hibernate application. From the demo I understand most of the Field constructor: Field(String name, String value, Field.Store store, Field.Index index, Field.TermVector termVector) Except what the Field.TermVector termVector doe

Re: New Lucene QueryParser

2007-01-03 Thread Laurent Hoss
Hi Mark As said in a previous mail, I'm very interested in your Parser and I'm happy to hear you made progress , and implemented Paragraph/Sentence proximity search functionality. :) This is the killer feature for me! and if the execution of the resulting query ( a mix containing SpanQuery 's