Am 02.09.2010 um 18:22 schrieb Stephan Witt:

> Am 02.09.2010 um 15:42 schrieb Abdelrazak Younes:
> 
>> On 09/02/2010 02:19 PM, Stephan Witt wrote:
>>> Am 02.09.2010 um 13:42 schrieb Abdelrazak Younes:
>>> 
>>> 
>>>> On 09/02/2010 12:13 PM, Stephan Witt wrote:
>>>> 
>>>>> I could go back to the row based spell check and abandon the checker 
>>>>> state of paragraph.
>>>>> But I decided to do the move from row to paragraph checking when seeing 
>>>>> the loop code in
>>>>> Buffer.cpp which implements the explicit spell check with F7.
>>>>> This code iterates word wise without knowledge of screen rows (of 
>>>>> course). So the
>>>>> performance problem pops up here again.
>>>>> 
>>>>> 
>>>> Quite franqly I think the spell cache solution that I proposed a while 
>>>> back would have resolved all performance problems. Pity that I don't have 
>>>> any time to work on this...
>>>> 
>>> I don't think so. Remember that the checking of the whole visible part of 
>>> the document word per word - even when it is done only once - is a problem 
>>> with apples native spell checker.
>>> 
>> 
>> Well, if it is done once the slowdown will be once for one word. If the word 
>> is found again in the document (and the probability is quite high) then the 
>> spell cache will come in play without any help from the slow spellchecker.
> 
> Would be handy to have some statistics of relative frequency of word usage... 
> I don't have them.

I changed the Paragraph class to do that. The result for the Tutorial is:

Paragraph.cpp(3296): paragraph statistics: 5 words, 5 distinct.
Paragraph.cpp(3320): register statistics: 5 words, 5 new words.
Paragraph.cpp(3296): paragraph statistics: 1 words, 1 distinct.
Paragraph.cpp(3320): register statistics: 1 words, 1 new words.
Paragraph.cpp(3296): paragraph statistics: 8 words, 8 distinct.
Paragraph.cpp(3320): register statistics: 8 words, 8 new words.
Paragraph.cpp(3296): paragraph statistics: 27 words, 22 distinct.
Paragraph.cpp(3320): register statistics: 22 words, 21 new words.
Paragraph.cpp(3296): paragraph statistics: 3 words, 3 distinct.
Paragraph.cpp(3320): register statistics: 3 words, 3 new words.
... cut output of ~200 paragraphs
Paragraph.cpp(3296): paragraph statistics: 12 words, 10 distinct.
Paragraph.cpp(3320): register statistics: 10 words, 0 new words.
Paragraph.cpp(3296): paragraph statistics: 1 words, 1 distinct.
Paragraph.cpp(3320): register statistics: 1 words, 1 new words.
Paragraph.cpp(3296): paragraph statistics: 18 words, 15 distinct.
Paragraph.cpp(3320): register statistics: 15 words, 6 new words.

Sums up to
2605 words with length >= 6 characters, 872 different words.

This is a cache hit ratio of (2605-872)/2605 = 0.66 (66%)

Not very impressive...

The Users Guide looks better:
11921 words, 1738 misses (11921-1738)/11921 = 0.85 (85%)

Better indeed... 

but I'm not sure the cache solution alone is responsive enough for the user.
Maybe one can combine both: cache and paragraph/sentence based checking.

>> No, think Copy&Paste, track change, paragraph fusion/deletion, etc. There 
>> are a lot of corner cases.
> 
> I don't get it. All of these operations are going through official interfaces 
> of the Paragraph class.
> Or is that not true? And the constructors of the Paragraph can invalidate the 
> copied caches.
> 
> I'd like to check the "paragraph fusion" case. Please, can you point me into 
> the code where to find it?

Now I've checked the Copy&Paste... it works here without change. Change 
tracking I tested already, is working too.
Paragraph deletion is no problem for the spell checker. Paragraph fusion is 
used by Paste (right?), should work.

All these operations are calling Paragraph::insertChar() or 
Paragraph::insertInset() in the end, AFAICS.

Stephan

Reply via email to