The following code, that just runs a regex against a large exim log to report on top senders, is 140 times slower than similar C code using PCRE, when compiled with just -O. With a bunch of other flags I got it down to only 13x slower than C code that's using libc regcomp/regexec.

import std.stdio, std.string, std.regex, std.array, std.algorithm;

  T min(T)(T a, T b) {
          if (a < b) return a;
          return b;
  }

  void main() {
          ulong[string] emailcounts;
          auto re = ctRegex!(r"(?:\S+ ){3,4}<= ([^@]+@(\S+))");

          foreach (line; File("exim_mainlog").byLine()) {
                  auto m = line.match(re);
                  if (m) {
                          ++emailcounts[m.front[1].idup];
                  }
          }

          string[] senders = emailcounts.keys;
sort!((a, b) { return emailcounts[a] > emailcounts[b]; })(senders);
          foreach (i; 0 .. min(senders.length, 5)) {
writefln("%5s %s", emailcounts[senders[i]], senders[i]);
          }
  }

Other code's available at https://github.com/jrfondren/topsender-bench
I get D down to 1.2x slower with PCRE and getline()

I wrote this part of the way through chapter 1 of "The D Programming Language", so my question is mainly: is this a fair result? std.regex is very slow and I should reach for PCRE if regex speed matters? Or is this code severely flawed somehow? I'm using a random production log; not trying to make things
difficult.

Relatedly, how can I add custom compiler flags to rdmd, in a D script?
For example, -L-lpcre

Reply via email to