On 2024-08-31 13:24, Takashi Yano via Cygwin wrote:
On Sat, 31 Aug 2024 09:59:11 -0600
Jim Reisert AD1C wrote:
Something has changed in the last month or two.  I have a very large
file I am trying to grep (465 MB):

-rwxrw----+ 1 jjrei jjrei 465092052 Aug 31 09:39 all_spots.txt


If I grep for something near the end of the file, the results return right away:

# time grep -n N0FUL all_spots.txt

17027336:N0FUL,20240615,20240615,1
17027337:N0FUL,20240629,20240629,1

real    0m0.190s
user    0m0.078s
sys     0m0.078s


If I pipe the file through cat, grep takes much longer:

# time cat all_spots.txt | grep -n N0FUL

17027336:N0FUL,20240615,20240615,1
17027337:N0FUL,20240629,20240629,1


real    1m4.934s
user    0m0.031s
sys     0m0.124s

Thanks for the report. This seems to be a regression of cygwin 3.5.4.
I'll submit a patch for this issue shortly.

Remember many Unix derived utilities use mmap-ed files when available, to have the paging system handle file I/O, allowing them to use memory operations to do read/write operations and searches at high speed. It would be worth your while to time grepping all files vs cat into one file and grep that.
In either case, it will mostly be faster to operate directly on files.

$ ls -1gloU /var/log/*.log | awk '{t+=$3};END{print int(NR/1024+0.5) "k files",int(t/1024/1024+0.5) "MB"}'
26k files 59MB

$ time grep -h -e cygwin -- /var/log/*.log > /tmp/grep.log

real    0m8.996s
user    0m1.015s
sys     0m7.983s

$ time cat -- /var/log/*.log > /tmp/var.log && grep -h -e cygwin -- /tmp/var.log > /tmp/cat-grep.log

real    0m9.557s
user    0m0.953s
sys     0m8.609s

$ wc -lc -- /tmp/var.log /tmp/*grep.log
  708552 61905630 /tmp/var.log
   35481  5652354 /tmp/cat-grep.log
   35481  5652354 /tmp/grep.log

--
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                -- Antoine de Saint-Exupéry

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to