Le jeu. 24 août 2023 à 08:58, Daniel Green a écrit :
> Re Perl's read speed, it's faster when not doing the line number check for
> every line. So `perl -ne 'print if (/pattern/)'` is only ~2.60s, compared
> to ~3.28s for `perl -ne 'print if ($. == 1 || /pattern/)'`. Doing nothing
> in Perl, i.e.
Re Perl's read speed, it's faster when not doing the line number check for
every line. So `perl -ne 'print if (/pattern/)'` is only ~2.60s, compared
to ~3.28s for `perl -ne 'print if ($. == 1 || /pattern/)'`. Doing nothing
in Perl, i.e., `perl -ne ''` is only ~1.38s.
Dan
On Wed, Aug 23, 2023 at 6
Ah - those times show another reason why one might
be motivated to keep requesting more options be added
to grep.
>From those timings, and from looking at the source, it's clear
that the FSF rewrote grep from scratch, sometime back in the
late 1980's or early 1990's, to have fast reads, whereas se
On the original test machine I timed the sed solution, as well as `(grep
-m1 . 'file' && grep 'pattern' 'file')` and `(mapfile -n1 <'file' && echo
$MAPFILE[0] && grep 'pattern' 'file')` and `(head -n1 'file' && grep
'pattern' 'file')`. Total table of speeds.
grep (v2.20):~1.15s
perl (
oops - grep slower than awk, not the other way around,
on these _highly_ inconclusive timings.
--
Paul Jackson
p...@usa.net
sed and awk can also to this (1st line plus any matching lines)
Following transcript from zsh session on my fast Ryzen:
$ <<-'@@' time sh -c "grep -m1 USER && grep carenas"
USER,TIP
john,0
jane,10
carenas,100
@@
USER,TIP
carenas,100
sh -c "grep -m1 USER && grep carenas" 0.00s user 0.00s system
I don't have access to a newer gawk where I did the initial timings, but I
ran an almost identical test on my home machine.
grep (v3.11): ~0.60s
perl (v5.38.0):~3.21s
gawk (v4.0.2 built from source
> Daniel Green wrote:
>
> > I've never looked at the grep source code
> > before, but could be tempted to try implementing it myself if there was any
> > chance of the path being accepted.
A slightly more complicated perl script would be my first choice if
coding is the solution, but grep already
I can't speak for the grep guys, but at least I was correct that
current gawk is much faster than gawk 4.0.2.
Arnold
Daniel Green wrote:
> I don't have access to a newer gawk where I did the initial timings, but I
> ran an almost identical test on my home machine.
>
> grep (v3.11):
That works, as well as the Perl version I've been using:
perl -ne 'print if ($. == 1 || /pattern/)'
But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
show the problem:
grep (v2.20):~1.15s
perl (v5.36.1): ~4.48s
awk (v4.0.2): ~10.81s
Admittedly grep
On 8/21/23 13:37, arn...@skeeve.com wrote:
it solves your problem NOW,
instead of waiting for a feature that the grep developers
aren't likely to add.
Yes, Grep already has a lot of features that in hindsight would have
better addressed by saying "Use Awk".
Gawk 4.0.2 is 11 years old. Try timing the current version,
I'll bet it's faster. And it solves your problem NOW,
instead of waiting for a feature that the grep developers
aren't likely to add.
My two cents of course.
Arnold
Daniel Green wrote:
> That works, as well as the Perl version I've b
Daniel Green wrote:
> I'm frequently searching CSV files with 20-30 columns, and when there's a
> hit it can be hard to know what the columns are. An option to also print
> the first line of a file (either always, or only if that file had a match
> to the pattern) in addition to any hits would be
I'm frequently searching CSV files with 20-30 columns, and when there's a
hit it can be hard to know what the columns are. An option to also print
the first line of a file (either always, or only if that file had a match
to the pattern) in addition to any hits would be nice.
Thanks,
Dan
14 matches
Mail list logo