On Tue, Mar 09, 2004 at 09:25:29AM +0100, Alfredo Braunstein wrote: > On Friday 05 March 2004 19:45, Andre Poenitz wrote: > > > On Thursday 04 March 2004 17:15, you wrote: > > >> > Do you have an idea why is it slow? > > >> > > >> No(t yet). > > > > > > Looking at it a bit closely, I do. I'm pretty sure it's the getPar > > > business, that makes things O(n^2). > > > > This might be a reason, but I'd rather expect a large constant > > factor. DocumentIterator accesses every character, i.e. we have > > already 170k iterations or so for the UserGuide. The O(n^2) for > > the ParList access should not be visible for our (small) n. > > Just try with Userguide,UserGuide2, UserGuide3 and you get a nice quadratic > behaviour. Interpolating to UserGuide20 gives more that two hours. > PosIterator still takes sub-second time in scanning UserGuide20. > I've recompiled dociterator.C with -pg -O2 and attach the resulting profile > (scanning through UserGuide2). getPar takes 97.9% of the time. I think the > evidence is more than enough. > > Maybe we should bring this to the list for ideas?
The obvious idea is to use and/or cache some ParagraphList iterator instead of the par offset... Maybe we've to use both. In fact, it's almost the same as with the 'inset_' member: This is just a form of a cache itself. So we'd remove the 'paroffset_type & CursorSlice::par()' member and provide 'elementary' accessors setFirstPar() setLastPar() incrementPar() etc. Doesn't sound overly complicated... Andre'