On Tue, Feb 3, 2009 at 1:14 AM, mark harwood <markharw...@yahoo.co.uk>wrote:
> >>My documents are quite big sometimes up to 300ktokens. > > You could look at indexing them as seperate documents using overlapping > sections of text. Erik used this for one of his projects. > Can you describe this in a little more detail; I'm not exactly sure what you mean. > Cheers > Mark > > > > ----- Original Message ---- > From: Michael Stoppelman <stop...@gmail.com> > To: java-user@lucene.apache.org > Sent: Tuesday, 3 February, 2009 7:24:06 > Subject: Poor QPS with highlighting > > Hi all, > > My search backends are only able to eek out 13-15 qps even with the entire > index in memory (this makes it very expensive to scale). According to my > YourKit profiler 80% of the program's time ends up in highlighting. With > highlighting disabled my backend gets about 45-50 qps (cheaper scaling)! > We're using Mark's TokenSources contrib. to make reconstructing of the > document quicker. I was contemplating patching the index to store offsets > for every term (instead of just the ordinal positions) so that I could make > the highlighting faster (since you would know where you hit in the document > on the search pass). I saw this thread from 2004: > http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg04743.html - > which asks about adding offsets to the index but it was decided against > because it would make the index too large. I can totally understand this; > but as machines get more beefy it would probably be nice to make this > optional since having 15 qps vs 50qps is quite a trade-off right now. Are > other folks seeing this? My documents are quite big sometimes up to 300k > tokens. Also my document fields are compressed which is also a time sink > for > the cpu. > > Please let me know if you need more details, happy to share. > > Sincerely, > M > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >