On 2015-10-21, Jim Meyering wrote: > On Wed, Oct 21, 2015 at 1:09 PM, Gary Johnson wrote: > > On 2015-10-18, Jim Meyering wrote: > >> > I built the snapshot on two systems, a fairly old one running Ubuntu > >> > 10.04.4 and a newer one running an up-to-date Linux Mint 17.2. > >> > 'make check' reported the same two failures on both: > >> > > >> > XFAIL: backref-alt > >> > XFAIL: triple-backref > >> > >> Thanks for building and reporting. > >> Each of those "XFAIL"s indicates an expected failure, so that is the > >> expected test result, for now. > > > > OK, thanks. > > > > I also built the snapshot successfully on a Fedora 17 system that I > > use for real work. I just ran a performance test, FWIW. I searched > > recursively in our source hierarchy of 6044 regular files and 1102 > > directories for a simple string. > > > > time grep -Rin mystring src > /dev/null > > > > Here are the results, averaged over three trials each, not including > > any slow times clearly due to updating caches. > > > > 2.12 2.21 2.21.78-7da30 > > ----- ----- ----- > > real 18.0s 1.08s 2.36s > > user 17.8s 0.96s 2.24s > > sys 0.12s 0.11s 0.10s > > > > Version 2.12 was /bin/grep. The other two versions I built myself. > > Thank you for the timings. Next time, please include the following:
This is kind of long, so I'll summarize here. The relatively poor performance I observed of grep-2.21.78 appears to have been due to my having built it in an environment tainted with CFLAGS from the build of another project. A clean build of grep-2.21.78 resulted in performance only slightly worse than grep-2.21. > - CPU type/speed >From lshw (probably more than you wanted): *-cpu:0 description: CPU product: Quad-Core Xeon 5xxx vendor: Intel Corp. physical id: 5 bus info: cpu@0 version: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz slot: CPU0 PROCESSOR size: 1596MHz capacity: 2128MHz width: 64 bits clock: 505MHz capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perf mon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid cpufreq configuration: cores=4 enabledcores=4 threads=4 *-cache:0 description: L1 cache physical id: 7 slot: L1 Cache size: 256KiB capacity: 256KiB capabilities: burst internal write-through unified *-cache:1 description: L2 cache physical id: 8 slot: L2 Cache size: 1MiB capacity: 1MiB capabilities: burst internal write-back unified *-cache:2 description: L3 cache physical id: 9 slot: L3 Cache size: 4MiB capacity: 4MiB capabilities: burst internal write-back unified *-cpu:1 description: CPU product: Quad-Core Xeon 5xxx vendor: Intel Corp. physical id: 6 bus info: cpu@1 version: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz slot: CPU1 PROCESSOR size: 1596MHz capacity: 2128MHz width: 64 bits clock: 505MHz capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid cpufreq configuration: cores=4 enabledcores=4 threads=4 *-cache:0 description: L1 cache physical id: a slot: L1 Cache size: 256KiB capacity: 256KiB capabilities: burst internal write-through unified *-cache:1 description: L2 cache physical id: b slot: L2 Cache size: 1MiB capacity: 1MiB capabilities: burst internal write-back unified *-cache:2 description: L3 cache physical id: c slot: L3 Cache size: 4MiB capacity: 4MiB capabilities: burst internal write-back unified > - file system type (and SSD or spinning rust) Type: ext4 Size: 1.1 TB Spinning rust The file system resides on an LVM logical volume composed of two physical volumes. One physical volume is on a Seagate ST3250318AS and the other is on a Western Digital WDC WD1002FAEX-0. I didn't build the system, so I don't know very much about this. > - OS version Fedora 17 Kernel: 3.3.4-5.fc17.x86_64 > - options with which you configured/built grep Version 2.21: ./configure --prefix=$HOME/src/grep-2.21 make Version 2.21.78-7da30: ./configure --prefix=$HOME/src/grep-2.21.78 make gcc is: gcc (GCC) 4.7.0 20120507 (Red Hat 4.7.0-5) > - your current locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE=C LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= > While you see a performance degradation going from 2.21 to the > first 2.22 release candidate, I see the opposite trend, albeit barely > measurable: > > Searching the following hierarchies, I see a consistent 1% improvement > going from 2.21 to 2.22 on an Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz. > The files I searched were on an ext4 file system residing on an SSD > (OCZ-VERTEX3). > This system is using fedora rawhide. > > $ find [a-g]* -type f|wc -l > 335065 > $ find [a-g]* -type d|wc -l > 9667 > $ du -shc [a-g]* > 25M autoconf > 125M automake > 129M bison > 74M cppi > 437M cu > 103M diffutils > 732M emacs > 2.3G gcc > 345M glibc > 252M gnulib > 187M grep > 90M gzip > 4.7G total > > Both grep binaries were compiled with gcc-6.0.something (built from git) > using ./configure --enable-gcc-warnings && make > > Here are best-of-3 timings running this command: > > env LC_ALL=en_US.UTF-8 time grep -ri mystring [a-g]* > /dev/null > > grep-2.21: 8.05user 1.10system 0:09.17elapsed 99%CPU > (0avgtext+0avgdata 32876maxresident)k > 0inputs+0outputs (0major+9986minor)pagefaults 0swaps > > grep-2.22: 8.04user 1.04system 0:09.10elapsed 99%CPU > (0avgtext+0avgdata 32940maxresident)k > 0inputs+0outputs (0major+9988minor)pagefaults 0swaps > > It is critical to mention the locale you use. > As you see above, I explicitly set LC_ALL=en_US.UTF-8. > Note that when I switch to LC_ALL=C, it halves those times, > although the ~1% win with 2.22 still remains > > Would you please compile 2.21 yourself, too? Otherwise, the timing may > be biased by the fact that distribution-provided binaries are often > better optimized than those one gets when building from sources with > the default options. If we can identify a modern system for which > there is anywhere near a 2x performance regression, I would be very > interested to learn more. Version 2.21 is one I compiled myself. The distribution-provided version is 2.12. Your comments encouraged me to pay more attention to what I was doing. I compared the config.log files from the grep-2.21 and grep-2.21.78-7da30 directories and noticed that the environments and results were slightly different. I noticed that CFLAGS had been set to "-g -DFEAT_CONCEAL" for a Vim build and had been used when I built grep-2.21.78. Also, I had built grep-2.21 back in February and couldn't be sure that nothing relevant had changed on the system since then. So I opened a new xterm window, created two new build directories and untarred, configured and made both grep versions from scratch. New measurements showed no difference between the two 2.21 builds, but a significant improvement in the 2.21.78 times. Here are the new results. The times of successive runs were very close, so I just chose a representative example of each. In short, 2.21.78 appears _slightly_ slower than 2.21, but not enough (for me) to worry about. ==================================================================== $ time ~/grep-2.21-new/bin/grep -ri mystring src > /dev/null real 0m0.814s user 0m0.725s sys 0m0.081s $ time LC_ALL=en_US.UTF-8 ~/grep-2.21-new/bin/grep -ri mystring src > /dev/null real 0m0.817s user 0m0.720s sys 0m0.090s $ time LC_ALL=C ~/grep-2.21-new/bin/grep -ri mystring src > /dev/null real 0m0.350s user 0m0.252s sys 0m0.094s ==================================================================== $ time ~/grep-2.21.78-new/bin/grep -ri mystring src > /dev/null real 0m0.849s user 0m0.756s sys 0m0.086s $ time LC_ALL=en_US.UTF-8 ~/grep-2.21.78-new/bin/grep -ri mystring src > /dev/null real 0m0.849s user 0m0.751s sys 0m0.090s $ time LC_ALL=C ~/grep-2.21.78-new/bin/grep -ri mystring src > /dev/null real 0m0.354s user 0m0.267s sys 0m0.082s ==================================================================== I'm sorry for wasting your time on a wild goose chase. (But my new grep works better now!) Regards, Gary