diff(1) goes into cpu-hogging endless loop
Hi, A "diff" of 2 files: 1 77,933,904 bytes 2 63,013,818 bytes , goes into an endless loop, whilst "gdiff" completes the operation in about 5 seconds. I tested using the latest "diff" from current, and get the same result. Splitting both files into 10Mb chunks, and diffing these was successful. A ktrace of the "diff" actually stops producing any output after about 5 seconds, whilst the cpu looping continues. Any ideas on what to do next? Does anyone else get the same result? The files are just utf-8 freebsd git logs, and are available here if anyone would like to test: http://www.catflap.org/jamie/1.xz (13,282,864 bytes) http://www.catflap.org/jamie/2.xz (12,221,164 bytes) Cheers, Jamie
Re: diff(1) goes into cpu-hogging endless loop
Just to add, that whilst the "diff" succeeded with the files split into 10Mb chunks, the time taken to run was really high, up to 10 times longer than gnu diff: + /usr/bin/time diff 1.aa 2.aa 16.74 real16.70 user 0.03 sys + /usr/bin/time diff 1.ab 2.ab 16.53 real16.45 user 0.07 sys + /usr/bin/time diff 1.ac 2.ac 21.58 real21.51 user 0.06 sys + /usr/bin/time diff 1.ad 2.ad 22.37 real22.25 user 0.11 sys + /usr/bin/time diff 1.ae 2.ae 25.93 real25.81 user 0.11 sys + /usr/bin/time diff 1.af 2.af 26.63 real26.53 user 0.09 sys + /usr/bin/time diff 1.ag 2.ag 0.98 real 0.96 user 0.02 sys + /usr/bin/time gdiff 1.aa 2.aa 2.44 real 2.37 user 0.06 sys + /usr/bin/time gdiff 1.ab 2.ab 4.09 real 4.06 user 0.03 sys + /usr/bin/time gdiff 1.ac 2.ac 2.24 real 2.22 user 0.01 sys + /usr/bin/time gdiff 1.ad 2.ad 1.99 real 1.98 user 0.00 sys + /usr/bin/time gdiff 1.ae 2.ae 2.63 real 2.60 user 0.02 sys + /usr/bin/time gdiff 1.af 2.af 2.62 real 2.59 user 0.03 sys + /usr/bin/time gdiff 1.ag 2.ag 0.12 real 0.11 user 0.00 sys
Re: diff(1) goes into cpu-hogging endless loop
On Sat, Mar 25, 2023 at 09:55:14PM +, Jamie Landeg-Jones wrote: > Hi, A "diff" of 2 files: > > 1 77,933,904 bytes > 2 63,013,818 bytes > > , goes into an endless loop, whilst "gdiff" completes the operation in > about 5 seconds. > > I tested using the latest "diff" from current, and get the same result. > > Splitting both files into 10Mb chunks, and diffing these was successful. > > A ktrace of the "diff" actually stops producing any output after about > 5 seconds, whilst the cpu looping continues. > > Any ideas on what to do next? Does anyone else get the same result? > > The files are just utf-8 freebsd git logs, and are available here if > anyone would like to test: > > http://www.catflap.org/jamie/1.xz (13,282,864 bytes) > http://www.catflap.org/jamie/2.xz (12,221,164 bytes) > > Cheers, Jamie My guess is that you are hitting a worst case in the stone algorithm. I have a WIP review to integrate the Myers algorithm from libdiff here: https://reviews.freebsd.org/D36860 - Tom
Re: diff(1) goes into cpu-hogging endless loop
Tom Jones wrote: > My guess is that you are hitting a worst case in the stone algorithm. I > have a WIP review to integrate the Myers algorithm from libdiff here: > > https://reviews.freebsd.org/D36860 Ahh, thanks, Tom. I'm glad it's being addressed. I'll check out the review. Cheers, Jamie