On 2024-08-31 13:24, Takashi Yano via Cygwin wrote:
On Sat, 31 Aug 2024 09:59:11 -0600
Jim Reisert AD1C wrote:
Something has changed in the last month or two. I have a very large
file I am trying to grep (465 MB):
-rwxrw----+ 1 jjrei jjrei 465092052 Aug 31 09:39 all_spots.txt
If I grep for something near the end of the file, the results return right away:
# time grep -n N0FUL all_spots.txt
17027336:N0FUL,20240615,20240615,1
17027337:N0FUL,20240629,20240629,1
real 0m0.190s
user 0m0.078s
sys 0m0.078s
If I pipe the file through cat, grep takes much longer:
# time cat all_spots.txt | grep -n N0FUL
17027336:N0FUL,20240615,20240615,1
17027337:N0FUL,20240629,20240629,1
real 1m4.934s
user 0m0.031s
sys 0m0.124s
Thanks for the report. This seems to be a regression of cygwin 3.5.4.
I'll submit a patch for this issue shortly.
Remember many Unix derived utilities use mmap-ed files when available, to have
the paging system handle file I/O, allowing them to use memory operations to do
read/write operations and searches at high speed.
It would be worth your while to time grepping all files vs cat into one file and
grep that.
In either case, it will mostly be faster to operate directly on files.
$ ls -1gloU /var/log/*.log | awk '{t+=$3};END{print int(NR/1024+0.5) "k
files",int(t/1024/1024+0.5) "MB"}'
26k files 59MB
$ time grep -h -e cygwin -- /var/log/*.log > /tmp/grep.log
real 0m8.996s
user 0m1.015s
sys 0m7.983s
$ time cat -- /var/log/*.log > /tmp/var.log && grep -h -e cygwin -- /tmp/var.log
> /tmp/cat-grep.log
real 0m9.557s
user 0m0.953s
sys 0m8.609s
$ wc -lc -- /tmp/var.log /tmp/*grep.log
708552 61905630 /tmp/var.log
35481 5652354 /tmp/cat-grep.log
35481 5652354 /tmp/grep.log
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple