http://en.wikipedia.org/wiki/Google_platform
Document server summarization On Thu, Feb 5, 2009 at 12:57 PM, Michael Stoppelman <stop...@gmail.com>wrote: > On Thu, Feb 5, 2009 at 12:47 PM, Michael Stoppelman <stop...@gmail.com > >wrote: > > > > > > > On Thu, Feb 5, 2009 at 9:05 AM, Jason Rutherglen < > > jason.rutherg...@gmail.com> wrote: > > > >> Google uses dedicated highlighting servers. Maybe this architecture > would > >> work for you. > >> > > > > What's your reference? I used to work at Google. > > > > I think creating a separate index/service would be reasonable and it's what > I purposed in a previous email on this thread... > "One option to get around the changing scoring would be to to run a > completely separate index for highlighting (with the overlapping docs you > described)." > > Still do lucene developers think storing the offsets is a bad idea from an > index size prospective or some other reason? > > M > > > > > >> On Mon, Feb 2, 2009 at 11:24 PM, Michael Stoppelman <stop...@gmail.com > >> >wrote: > >> > >> > Hi all, > >> > > >> > My search backends are only able to eek out 13-15 qps even with the > >> entire > >> > index in memory (this makes it very expensive to scale). According to > my > >> > YourKit profiler 80% of the program's time ends up in highlighting. > With > >> > highlighting disabled my backend gets about 45-50 qps (cheaper > scaling)! > >> > We're using Mark's TokenSources contrib. to make reconstructing of the > >> > document quicker. I was contemplating patching the index to store > >> offsets > >> > for every term (instead of just the ordinal positions) so that I could > >> make > >> > the highlighting faster (since you would know where you hit in the > >> document > >> > on the search pass). I saw this thread from 2004: > >> > > http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg04743.html- > >> > which asks about adding offsets to the index but it was decided > against > >> > because it would make the index too large. I can totally understand > >> this; > >> > but as machines get more beefy it would probably be nice to make this > >> > optional since having 15 qps vs 50qps is quite a trade-off right now. > >> Are > >> > other folks seeing this? My documents are quite big sometimes up to > 300k > >> > tokens. Also my document fields are compressed which is also a time > sink > >> > for > >> > the cpu. > >> > > >> > Please let me know if you need more details, happy to share. > >> > > >> > Sincerely, > >> > M > >> > > >> > > > > >