On Thu, Feb 5, 2009 at 9:05 AM, Jason Rutherglen <jason.rutherg...@gmail.com > wrote:
> Google uses dedicated highlighting servers. Maybe this architecture would > work for you. > What's your reference? I used to work at Google. > > On Mon, Feb 2, 2009 at 11:24 PM, Michael Stoppelman <stop...@gmail.com > >wrote: > > > Hi all, > > > > My search backends are only able to eek out 13-15 qps even with the > entire > > index in memory (this makes it very expensive to scale). According to my > > YourKit profiler 80% of the program's time ends up in highlighting. With > > highlighting disabled my backend gets about 45-50 qps (cheaper scaling)! > > We're using Mark's TokenSources contrib. to make reconstructing of the > > document quicker. I was contemplating patching the index to store offsets > > for every term (instead of just the ordinal positions) so that I could > make > > the highlighting faster (since you would know where you hit in the > document > > on the search pass). I saw this thread from 2004: > > http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg04743.html- > > which asks about adding offsets to the index but it was decided against > > because it would make the index too large. I can totally understand this; > > but as machines get more beefy it would probably be nice to make this > > optional since having 15 qps vs 50qps is quite a trade-off right now. Are > > other folks seeing this? My documents are quite big sometimes up to 300k > > tokens. Also my document fields are compressed which is also a time sink > > for > > the cpu. > > > > Please let me know if you need more details, happy to share. > > > > Sincerely, > > M > > >