Re: Speed of grouped queries

sdeck Thu, 11 Jan 2007 12:17:28 -0800

So, to reply to myself with info learned (since I like reading this forum for
what people have done with lucene. kudos to cnet by the way)

In tracking down some of the speed, I was able to manage some speed
improvements

Again, the movie examples are just metaphors for what I am really working on

1) I created a very simple in memory LRU cache for all of the movies bitset
collectors.  The movies will boolean query each of their actors together and
the run a overall query. This runs in about 20-30 ms, the actor queries them
selves run in 20-30ms.  So, I now have a simple and separate cache for both
movies and actors and store their bitset collectors.

2) using the cached bitset collectors, I then do a merge of all movies
within a genre and store that bitset collector into it's own cache. This one
would really only contain 5-10 elements.

So far, memory is handling things pretty well.

The final item, one in which seems to sometime affect things and other times
not is the highlighter process after the bitsets have been collected. This,
I found, threw off timing many times. It seems as though the highlighter
will run quickly on some documents, but then it ran slowly on others.  Will
probably need to dig into this, but since I have put in my paging system,
and the results them selves will be cached, speed is now all synched up.

I do have a prelim process that will go through each actor, movie and genre
and get a count of number of results that would be returned for their
searches and store that in a file. That way, I can at least show the user
how many results are available in side of the interface without actually
having to query for it.

Hopefully in a month or so I will be able to give a link to the public
website I am working on.
Fun stuff.

Scott

sdeck wrote:
> 
> Thanks for advanced on any insight on this one.
> 
> I have a fairly large query to run, and it takes roughly 20-40 seconds to
> complete the way that i have it.
> here is the best example I can give.
> 
> I have a set of roughly 25K documents indexed
> 
> I have queries that get documents matching a particular actor.
> 
> Then, I have a movie query that takes all of the documents found for each
> actor query and combines them all together to say, here are all documents
> that are relevant for this movie.
> 
> Then, and here is the time hog, I have a genre query that says, take all
> movies and get their results and combine them together into this genre
> result set.
> 
> 
> The problem is, at indexing time, I do not have a way to say if a document
> is a particular genre, or a particular actor, or movie etc.  If I try and
> say for the genre query, get all documents and then filter for the queries
> for movies and actors, I get heap space memory issues.
> 
> The query for collecting a specific actor is around 200-300 milliseconds,
> and the movie one, that actually queries each actor, takes roughly 500-700
> milliseconds. Yet, for a genre, where you may have 50-100 movies, it takes
> 500 milliseconds*# of movies
> 
> Any ideas on how I could run these queries differently? For a given actor
> query, there is about 5-7 boolean query clauses. Just to give some
> insight.
> 
> I currently just create 1 HitSetCollector (I rolled my own
> bitsetcollector) and just run searches with it.  I just get crapped on
> when it does that genre search. I wish there was an easier way to
> aggregate all of those documents together from all of those searches. 
> After it is done, I cache the results, but the initial hit is bad.
> 
> Any help would be much appreciated.
> Sdeck
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Speed-of-grouped-queries-tf2910499.html#a8285283
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Speed of grouped queries

Reply via email to