Poor regex performance?

Julian via Digitalmars-d-learn Thu, 04 Apr 2019 02:56:04 -0700

The following code, that just runs a regex against a large eximlogto report on top senders, is 140 times slower than similar C codeusingPCRE, when compiled with just -O. With a bunch of other flags Igot itdown to only 13x slower than C code that's using libcregcomp/regexec.

import std.stdio, std.string, std.regex, std.array,std.algorithm;


  T min(T)(T a, T b) {
          if (a < b) return a;
          return b;
  }

  void main() {
          ulong[string] emailcounts;
          auto re = ctRegex!(r"(?:\S+ ){3,4}<= ([^@]+@(\S+))");

          foreach (line; File("exim_mainlog").byLine()) {
                  auto m = line.match(re);
                  if (m) {
                          ++emailcounts[m.front[1].idup];
                  }
          }

          string[] senders = emailcounts.keys;

sort!((a, b) { return emailcounts[a] > emailcounts[b];})(senders);

          foreach (i; 0 .. min(senders.length, 5)) {

writefln("%5s %s", emailcounts[senders[i]],senders[i]);

          }
  }

Other code's available athttps://github.com/jrfondren/topsender-bench

I get D down to 1.2x slower with PCRE and getline()

I wrote this part of the way through chapter 1 of "The DProgramming Language",so my question is mainly: is this a fair result? std.regex isvery slow andI should reach for PCRE if regex speed matters? Or is this codeseverelyflawed somehow? I'm using a random production log; not trying tomake things

difficult.

Relatedly, how can I add custom compiler flags to rdmd, in a Dscript?

For example, -L-lpcre

Poor regex performance?

Reply via email to