On Tue, Mar 09, 2004 at 09:25:29AM +0100, Alfredo Braunstein wrote:
> On Friday 05 March 2004 19:45, Andre Poenitz wrote:
> > > On Thursday 04 March 2004 17:15, you wrote:
> > >> > Do you have an idea why is it slow?
> > >>
> > >> No(t yet).
> > >
> > > Looking at it a bit closely, I do. I'm pretty sure it's the getPar
> > > business, that makes things O(n^2).
> >
> > This might be a reason, but I'd rather expect a large constant
> > factor. DocumentIterator accesses every character, i.e. we have
> > already 170k iterations or so for the UserGuide. The O(n^2) for
> > the ParList access should not be visible for our (small) n.
> 
> Just try with Userguide,UserGuide2, UserGuide3 and you get a nice quadratic 
> behaviour. Interpolating to UserGuide20 gives more that two hours.
> PosIterator still takes sub-second time in scanning UserGuide20.
> I've recompiled dociterator.C with -pg -O2 and attach the resulting profile 
> (scanning through UserGuide2). getPar takes 97.9% of the time. I think the 
> evidence is more than enough.
> 
> Maybe we should bring this to the list for ideas?

The obvious idea is to use and/or cache some ParagraphList iterator
instead of the par offset...

Maybe we've to use both. In fact, it's almost the same as with the
'inset_' member: This is just a form of a cache itself.

So we'd remove the 'paroffset_type & CursorSlice::par()' member and
provide 'elementary' accessors

  setFirstPar()
  setLastPar()
  incrementPar()

etc. Doesn't sound overly complicated...

Andre'

Reply via email to