On Wed, Dec 15, 2010 at 11:50 AM, Johan Corveleyn <jcor...@gmail.com> wrote: > On Wed, Dec 15, 2010 at 2:30 AM, Stefan Fuhrmann <eq...@web.de> wrote: >> On 14.12.2010 23:35, Johan Corveleyn wrote: >>> >>> Hi all, >> >> Hi Johan ;) > > Hi Stefan, thanks for the input :). I suspected that you might have > some ideas about this ... > >>> On the diff-optimizations-bytes branch, in diff_file.c, there are two >>> functions which are called for every byte of the identical prefix and >>> suffix: increment_pointers and decrement_pointers. These functions are >>> actually equivalents of curp++ or curp--, reading the next/previous >>> byte, but taking into account the chunked-ness of the data (file data >>> is read in memory in chunks). >>> >>> As an experiment I changed these functions into macro's, eliminating >>> the function calls. This makes the diff algorithm another 10% - 15% >>> faster (granted, this was measured with my "extreme" testcase of a 1,5 >>> Mb file (60000 lines), of which most lines are identical >>> prefix/suffix). However, having an entire function like that >>> implemented as a macro isn't very pretty (see below for an example). >> >> I'm kind of surprised that the calling overhead is >> actually noticeable for larger files, i.e. larger values >> of file_len it should loop many times. > > file_len is the size of the array of files, not the length of the > files themselves. In the typical case (the only case that's currently > supported) file_len is 2. That's because we're diffing just 2 files: > the original one and the modified one. The implementation is made > with an array and a file_len (and "for" loops), because I wanted it to > be generically useful also for diff3 (with 3 files: original, modified > and latest) and diff4 (4 files: original, modified, latest and > ancestor). > > Also, I did my measurements with a blame operation on this very large > file, which has ~2500 revisions. So that's 2500 diffs of a ~1,5 Mb > file, with say on average 1 Mb of identical prefix/suffix every time. > That's some 2,500,000,000 function calls.
That should be: 5,000,000,000 function calls, because for every diff we advance prefix and suffix pointers in two files: the original one and the modified one. Johan > For measurement, I counted the total time of the blame operation, as > well as the cumulative time for the svn_diff_diff calls (with > apr_time_now() before and after). > >>> Some considerations: >>> - Maybe I can use APR_INLINE, with similar results? >>> >>> - Maybe I can put just the "critical section" into a macro (the curp++ >>> / curp-- part), and call a function when a chunk boundary is >>> encountered (~ once every 131072 iterations (chunks are 128 Kb >>> large)), to read in the new chunk, advancing variables, ... >> >> Prefer inlining because the compiler is free to ignore it. >> Depending on the target architecture and the compiler, >> it may be beneficial to "narrow" the scope of optimization: >> In large functions, the optimizer may guess wrongly about >> the hot spots. >> >>> - Maybe it's not worth it? >> >> Since inlining is for free from the maintenance POV, >> any gain should justify it. >>> >>> Thoughts? >> >> Two things you might try: >> * introduce a local variable for afile[i] >> * replace the for() loop with two nested ones, keeping >> calls to other functions out of the hot spot: >> >> for (i=0; i < file_len;) >> { >> /* hot spot: */ >> for(; i < file_len; i++) >> { >> curFile = afile[i]; >> if (curFile->chunk==-1) >> curFile->chunk = 0; >> else if (curFile->curp != curFile->endp -1) >> curFile->curp++; >> else >> break; >> } >> >> if (i < file_len) >> { >> /* the complex, rarely used stuff goes here */ >> } >> } > > Ok, when I have some time, I will experiment a little bit with > "inline", and see if the compiler "gets it" (I'll try your "nested for > loop" example (the corrected one, which you just sent in the other > mail) to help the compiler a little bit). I should be able to easily > compare that with the macro version. > > -- > Johan >