diff(1) goes into cpu-hogging endless loop

2023-03-25 Thread Jamie Landeg-Jones
Hi, A "diff" of 2 files:

1  77,933,904 bytes
2  63,013,818 bytes

, goes into an endless loop, whilst "gdiff" completes the operation in
about 5 seconds.

I tested using the latest "diff" from current, and get the same result.

Splitting both files into 10Mb chunks, and diffing these was successful.

A ktrace of the "diff" actually stops producing any output after about
5 seconds, whilst the cpu looping continues.

Any ideas on what to do next? Does anyone else get the same result?

The files are just utf-8 freebsd git logs, and are available here if
anyone would like to test:

http://www.catflap.org/jamie/1.xz (13,282,864 bytes)
http://www.catflap.org/jamie/2.xz (12,221,164 bytes)

Cheers, Jamie



Re: diff(1) goes into cpu-hogging endless loop

2023-03-25 Thread Jamie Landeg-Jones
Just to add, that whilst the "diff" succeeded with the files
split into 10Mb chunks, the time taken to run was really high, up
to 10 times longer than gnu diff:

+ /usr/bin/time diff 1.aa 2.aa 16.74 real16.70 user 0.03 sys
+ /usr/bin/time diff 1.ab 2.ab 16.53 real16.45 user 0.07 sys
+ /usr/bin/time diff 1.ac 2.ac 21.58 real21.51 user 0.06 sys
+ /usr/bin/time diff 1.ad 2.ad 22.37 real22.25 user 0.11 sys
+ /usr/bin/time diff 1.ae 2.ae 25.93 real25.81 user 0.11 sys
+ /usr/bin/time diff 1.af 2.af 26.63 real26.53 user 0.09 sys
+ /usr/bin/time diff 1.ag 2.ag  0.98 real 0.96 user 0.02 sys

+ /usr/bin/time gdiff 1.aa 2.aa 2.44 real 2.37 user 0.06 sys
+ /usr/bin/time gdiff 1.ab 2.ab 4.09 real 4.06 user 0.03 sys
+ /usr/bin/time gdiff 1.ac 2.ac 2.24 real 2.22 user 0.01 sys
+ /usr/bin/time gdiff 1.ad 2.ad 1.99 real 1.98 user 0.00 sys
+ /usr/bin/time gdiff 1.ae 2.ae 2.63 real 2.60 user 0.02 sys
+ /usr/bin/time gdiff 1.af 2.af 2.62 real 2.59 user 0.03 sys
+ /usr/bin/time gdiff 1.ag 2.ag 0.12 real 0.11 user 0.00 sys




Re: diff(1) goes into cpu-hogging endless loop

2023-03-25 Thread Tom Jones
On Sat, Mar 25, 2023 at 09:55:14PM +, Jamie Landeg-Jones wrote:
> Hi, A "diff" of 2 files:
> 
> 1  77,933,904 bytes
> 2  63,013,818 bytes
> 
> , goes into an endless loop, whilst "gdiff" completes the operation in
> about 5 seconds.
> 
> I tested using the latest "diff" from current, and get the same result.
> 
> Splitting both files into 10Mb chunks, and diffing these was successful.
> 
> A ktrace of the "diff" actually stops producing any output after about
> 5 seconds, whilst the cpu looping continues.
> 
> Any ideas on what to do next? Does anyone else get the same result?
> 
> The files are just utf-8 freebsd git logs, and are available here if
> anyone would like to test:
> 
> http://www.catflap.org/jamie/1.xz (13,282,864 bytes)
> http://www.catflap.org/jamie/2.xz (12,221,164 bytes)
> 
> Cheers, Jamie

My guess is that you are hitting a worst case in the stone algorithm. I
have a WIP review to integrate the Myers algorithm from libdiff here:

https://reviews.freebsd.org/D36860

- Tom



Re: diff(1) goes into cpu-hogging endless loop

2023-03-25 Thread Jamie Landeg-Jones
Tom Jones  wrote:

> My guess is that you are hitting a worst case in the stone algorithm. I
> have a WIP review to integrate the Myers algorithm from libdiff here:
>
> https://reviews.freebsd.org/D36860

Ahh, thanks, Tom. I'm glad it's being addressed. I'll check out
the review.

Cheers, Jamie