Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-06-06 Thread Johan Corveleyn
On Mon, Jun 6, 2011 at 7:38 PM, Morten Kloster wrote: > On Mon, Jun 6, 2011 at 3:17 AM, Johan Corveleyn wrote: >> On Wed, Jun 1, 2011 at 5:56 PM, Morten Kloster wrote: > [] >> Hi Morten, >> >> Sorry, it took me a little while longer than expected, but I finally >> got around to it. >> >> I did s

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-06-06 Thread Morten Kloster
On Mon, Jun 6, 2011 at 3:17 AM, Johan Corveleyn wrote: > On Wed, Jun 1, 2011 at 5:56 PM, Morten Kloster wrote: [] > Hi Morten, > > Sorry, it took me a little while longer than expected, but I finally > got around to it. > > I did some more tests, and upon further investigation, I too can't > real

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-06-05 Thread Johan Corveleyn
On Wed, Jun 1, 2011 at 5:56 PM, Morten Kloster wrote: > On Wed, Jun 1, 2011 at 1:35 AM, Johan Corveleyn wrote: >> On Tue, May 31, 2011 at 12:44 PM, Johan Corveleyn wrote: >> ... > [] >> I'll get into some more testing and reviewing tomorrow or the day >> after (unless someone else beats me to it

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-06-01 Thread Morten Kloster
On Wed, Jun 1, 2011 at 1:35 AM, Johan Corveleyn wrote: > On Tue, May 31, 2011 at 12:44 PM, Johan Corveleyn wrote: > ... [] > I'll get into some more testing and reviewing tomorrow or the day > after (unless someone else beats me to it :-)). > > Cheers, > -- > Johan > I had trouble getting any re

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-31 Thread Johan Corveleyn
On Tue, May 31, 2011 at 12:44 PM, Johan Corveleyn wrote: ... > Maybe a new 'knob' should be added for this? To make it easier to > (stress) test the LCS ... sorry don't have time to do it myself now. Added the new 'knob' in r1129957 (with followup in r1129965). To disable the prefix/suffix scanni

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-31 Thread Johan Corveleyn
On Tue, May 31, 2011 at 12:03 PM, Daniel Shahaf wrote: > Johan Corveleyn wrote on Tue, May 31, 2011 at 02:53:47 +0200: >> - Take a closer look at measuring the overhead of the token counting. >> Maybe you can also provide some numbers here? I think a good test for >> measuring this in practice is:

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-31 Thread Daniel Shahaf
Johan Corveleyn wrote on Tue, May 31, 2011 at 02:53:47 +0200: > - Take a closer look at measuring the overhead of the token counting. > Maybe you can also provide some numbers here? I think a good test for > measuring this in practice is: > 1. take a very large file > 2. change a line in the be

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-30 Thread Johan Corveleyn
On Mon, May 30, 2011 at 8:50 PM, Morten Kloster wrote: > Johan, any progress on reviewing the code on your part? Things are > a bit simpler now with the idx patch in: Since that patch settled the > file (re)order issue, this patch now produces fully identical results to > HEAD. Thanks for the upd

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-30 Thread Morten Kloster
On Mon, May 30, 2011 at 12:26 PM, Stefan Sperling wrote: > On Mon, May 30, 2011 at 08:25:38AM +0200, Markus Schaber wrote: >> Hi, Morten, >> >> > Von: Morten Kloster [mailto:mor...@gmail.com] >> >> > I haven't changed the index/count types yet. What's the right type to use >> > to get signed 32 bi

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-30 Thread Stefan Sperling
On Mon, May 30, 2011 at 08:25:38AM +0200, Markus Schaber wrote: > Hi, Morten, > > > Von: Morten Kloster [mailto:mor...@gmail.com] > > > I haven't changed the index/count types yet. What's the right type to use > > to get signed 32 bit on 32-bit machines and signed 64 bit on 64-bit > > machines? >

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-30 Thread Branko Čibej
On 29.05.2011 20:02, Morten Kloster wrote: > Bert informed me that intptr_t might not be work on all systems, so I've > made a new typedef svn_diff__token_index_t and set it to long int for now. You could try apr_uintptr_t. -- Brane

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-29 Thread Morten Kloster
Here's a version that resolves conflicts from r1128921. Index: subversion/libsvn_diff/diff.c === --- subversion/libsvn_diff/diff.c (revision 1128966) +++ subversion/libsvn_diff/diff.c (working copy) @@ -33,7 +33,39 @@ #i

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-29 Thread Morten Kloster
>> To: Julian Foad >>> Cc: Mark Phippard; dev@subversion.apache.org >>> Subject: Re: [PATCH] Speed-up of libsvn_diff using token counts >>> >>> On Fri, May 27, 2011 at 7:57 PM, Julian Foad >>> wrote: >>> > Morten Kloster wrote: >>> >> On

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-29 Thread Morten Kloster
On Sun, May 29, 2011 at 6:00 PM, Bert Huijben wrote: > > >> -Original Message- >> From: Morten Kloster [mailto:mor...@gmail.com] >> Sent: zondag 29 mei 2011 17:35 >> To: Julian Foad >> Cc: Mark Phippard; dev@subversion.apache.org >> Subject: Re: [

RE: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-29 Thread Bert Huijben
> -Original Message- > From: Morten Kloster [mailto:mor...@gmail.com] > Sent: zondag 29 mei 2011 17:35 > To: Julian Foad > Cc: Mark Phippard; dev@subversion.apache.org > Subject: Re: [PATCH] Speed-up of libsvn_diff using token counts > > On Fri, May 27, 2011

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-29 Thread Morten Kloster
On Fri, May 27, 2011 at 7:57 PM, Julian Foad wrote: > Morten Kloster wrote: >> On Fri, May 27, 2011 at 4:55 PM, Julian Foad >> wrote: >> > Morten Kloster wrote: >> >> I haven't changed the index/count types yet. What's the right type >> >> to use to get signed 32 bit on 32-bit machines and signe

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-27 Thread Mark Phippard
On Fri, May 27, 2011 at 10:43 AM, Morten Kloster wrote: > My bad; I had forgotten to wrap two of the counting loops in NULL > checks. This version fixes it. Thanks again for catching that bug, > Mark. > > I've also figured out how to run the test C programs. The new > version passes all libsvn_dif

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-27 Thread Julian Foad
Morten Kloster wrote: > On Fri, May 27, 2011 at 4:55 PM, Julian Foad wrote: > > Morten Kloster wrote: > >> I haven't changed the index/count types yet. What's the right type > >> to use to get signed 32 bit on 32-bit machines and signed 64 bit > >> on 64-bit machines? > > > > "int"? > > Is int gu

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-27 Thread Morten Kloster
On Fri, May 27, 2011 at 4:55 PM, Julian Foad wrote: > Morten Kloster wrote: >> I haven't changed the index/count types yet. What's the right type >> to use to get signed 32 bit on 32-bit machines and signed 64 bit >> on 64-bit machines? > > "int"? > > - Julian > > > Is int guaranteed to correspon

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-27 Thread Julian Foad
Morten Kloster wrote: > I haven't changed the index/count types yet. What's the right type > to use to get signed 32 bit on 32-bit machines and signed 64 bit > on 64-bit machines? "int"? - Julian

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-27 Thread Morten Kloster
My bad; I had forgotten to wrap two of the counting loops in NULL checks. This version fixes it. Thanks again for catching that bug, Mark. I've also figured out how to run the test C programs. The new version passes all libsvn_diff tests. I haven't changed the index/count types yet. What's the ri

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-27 Thread Morten Kloster
Diff3 seemed to work fine here in preliminary testing. I'll run more test, but just in case, which version of the patch did you use? If you somehow got the .patch file that I included with my original post, which didn't make it to the archive, that version had a bug (as noted in my 2nd post). It's

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-26 Thread Morten Kloster
On Thu, May 26, 2011 at 6:32 PM, Mark Phippard wrote: > On Thu, May 26, 2011 at 11:13 AM, Mark Phippard wrote: >> I applied your patch on OSX and figured I would at least run the tests >> for you.  Builds go OK, but the one of the tests fail and two other >> tests crash your patch applied.  The c

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-26 Thread Mark Phippard
On Thu, May 26, 2011 at 11:13 AM, Mark Phippard wrote: > I applied your patch on OSX and figured I would at least run the tests > for you.  Builds go OK, but the one of the tests fail and two other > tests crash your patch applied.  The crashes look the same in both > cases. > > Note, I have not r

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-26 Thread Mark Phippard
Hi, I applied your patch on OSX and figured I would at least run the tests for you. Builds go OK, but the one of the tests fail and two other tests crash your patch applied. The crashes look the same in both cases. Note, I have not run tests recently on this box, so I am not 100% certain the te

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-25 Thread Johan Corveleyn
On Thu, May 26, 2011 at 12:00 AM, Morten Kloster wrote: [ snip ] > I have, however, done some significant testing today: I downloaded > the first 100 revisions of merge.c in libsvn_client (merge.c is the > largest code file in the current HEAD revision, with almost 800 > revisions), and ran diff o

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-25 Thread Morten Kloster
On Wed, May 25, 2011 at 1:44 AM, Johan Corveleyn wrote: > On Tue, May 24, 2011 at 10:46 PM, Stefan Sperling wrote: >> On Tue, May 24, 2011 at 10:22:58PM +0200, Morten Kloster wrote: >>> Johan / Stefan - any progress on the reviewing part? >> >> I haven't had time to review this, sorry. >> It got

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-24 Thread Johan Corveleyn
On Tue, May 24, 2011 at 10:46 PM, Stefan Sperling wrote: > On Tue, May 24, 2011 at 10:22:58PM +0200, Morten Kloster wrote: >> Johan / Stefan - any progress on the reviewing part? > > I haven't had time to review this, sorry. > It got a bit lost within the recent flurry of activity during and > aft

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-24 Thread Stefan Sperling
On Tue, May 24, 2011 at 10:22:58PM +0200, Morten Kloster wrote: > Johan / Stefan - any progress on the reviewing part? I haven't had time to review this, sorry. It got a bit lost within the recent flurry of activity during and after the Berlin hackathon. I would also need some time to look at thi

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-24 Thread Morten Kloster
Johan / Stefan - any progress on the reviewing part? Only possible error I'm aware of is that I made the indices/counts 32 bit signed/unsigned integers, which means it'll run into trouble if there are more than ~2 billion different tokens (lines) or more than ~4 billion occurrences of the same tok

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-19 Thread Morten Kloster
Never mind, it looks like it's just my browser messing up things. On Thu, May 19, 2011 at 2:28 PM, Daniel Shahaf wrote: > Did they?  I have not changed mutt/vim's configuration recently and I've > sent many patches inline before. > > Morten Kloster wrote on Thu, May 19, 2011 at 14:14:03 +0200: >>

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-19 Thread Daniel Shahaf
Did they? I have not changed mutt/vim's configuration recently and I've sent many patches inline before. Morten Kloster wrote on Thu, May 19, 2011 at 14:14:03 +0200: > Yeah, it was the same problem that I had when I included the patch > directly in the email; the indentations got all messed up on

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-19 Thread Stefan Sperling
On Thu, May 19, 2011 at 01:44:56PM +0200, Morten Kloster wrote: > Here is an attached copy of the patch. This one looks good. Thank you!

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-19 Thread Morten Kloster
Yeah, it was the same problem that I had when I included the patch directly in the email; the indentations got all messed up on the archive. On Thu, May 19, 2011 at 2:07 PM, Daniel Shahaf wrote: > Morten Kloster wrote on Thu, May 19, 2011 at 13:38:56 +0200: >> Here is a version without the whites

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-19 Thread Daniel Shahaf
Morten Kloster wrote on Thu, May 19, 2011 at 13:38:56 +0200: > Here is a version without the whitespace changes in diff3.c. I have also kept > the original indentation level of the loop in lcs.c that I have wrapped in > another loop. > > The indentations in Daniel's version had little or nothing t

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-19 Thread Morten Kloster
For some reason, a lot of the indentation gets messed up when posted in the archive - it looks fine both in my sent email and in the copy I received myself. Here is an attached copy of the patch. Morten > On Thu, May 19, 2011 at 11:38 AM, Stefan Sperling wrote: >> On Wed, May 18, 2011 at 07:31

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-19 Thread Morten Kloster
Here is a version without the whitespace changes in diff3.c. I have also kept the original indentation level of the loop in lcs.c that I have wrapped in another loop. The indentations in Daniel's version had little or nothing to do with my patch. :) [[[ Index: subversion/libsvn_diff/diff.c ==

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-19 Thread Stefan Sperling
On Wed, May 18, 2011 at 07:31:54PM +0200, Morten Kloster wrote: > > > > I'm attaching a version of the patch re-generated with -x-pw. > > > > [[[ > > Index: subversion/libsvn_diff/diff.c > > === > > --- subversion/libsvn_diff/diff.c  

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-18 Thread Morten Kloster
On Wed, May 18, 2011 at 9:25 PM, Johan Corveleyn wrote: > On Wed, May 18, 2011 at 9:23 PM, Johan Corveleyn wrote: >> On Wed, May 18, 2011 at 7:31 PM, Morten Kloster wrote: >>> Thanks, Daniel. >>> >>> Johan: >>> I've found your notes trunk/notes/diff-optimizations.txt. As you may >>> have realize

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-18 Thread Johan Corveleyn
On Wed, May 18, 2011 at 9:23 PM, Johan Corveleyn wrote: > On Wed, May 18, 2011 at 7:31 PM, Morten Kloster wrote: >> Thanks, Daniel. >> >> Johan: >> I've found your notes trunk/notes/diff-optimizations.txt. As you may >> have realized, my patch is a slightly different approach to the >> problem yo

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-18 Thread Johan Corveleyn
On Wed, May 18, 2011 at 7:31 PM, Morten Kloster wrote: > Thanks, Daniel. > > Johan: > I've found your notes trunk/notes/diff-optimizations.txt. As you may > have realized, my patch is a slightly different approach to the > problem you discuss in section I.5, "Discarding non-matching > lines before

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-18 Thread Morten Kloster
Thanks, Daniel. Johan: I've found your notes trunk/notes/diff-optimizations.txt. As you may have realized, my patch is a slightly different approach to the problem you discuss in section I.5, "Discarding non-matching lines before running the LCS (Longest Common Subsequence) algorithm." - rather th

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-18 Thread Daniel Shahaf
Please don't mix whitespace changes with functional changes. I'm attaching a version of the patch re-generated with -x-pw. [[[ Index: subversion/libsvn_diff/diff.c === --- subversion/libsvn_diff/diff.c (revision 1104340) +++ su

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-18 Thread Morten Kloster
On Wed, May 18, 2011 at 8:33 AM, Johan Corveleyn wrote: > On Wed, May 18, 2011 at 1:56 AM, Morten Kloster wrote: >> Log message: >> >> Speed-up of libsvn_diff using token counts >> By using indices, not node pointers, to refer to tokens, and counting >> the number of each token, the longest commo

Re: [PATCH] Speed-up of libsvn_diff using token counts

2011-05-17 Thread Johan Corveleyn
On Wed, May 18, 2011 at 1:56 AM, Morten Kloster wrote: > Log message: > > Speed-up of libsvn_diff using token counts > By using indices, not node pointers, to refer to tokens, and counting > the number of each token, the longest common subsequence (lcs) > algorithm at the heart of libsvn_diff beco