https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472

--- Comment #16 from Tim Shen <timshen at gcc dot gnu.org> ---
(In reply to Hans Ã…berg from comment #15)
> (In reply to Tim Shen from comment #14)
> > How fast does your prototype run on the following example?
> >   ((10)|(11)|(12)|...|(98)|(99))* matching "10111213141516...9899"
> > how about this:
> >   ((100)|(101)|(102)|...|(998)|(999))* matching "100101102...998999"
> > 
> > where "..." are the skipped strings that follow the surrounding pattern.
> 
> My program does not choke on limit cases, like the regex '(a?)^k a^k' on the
> string string "a"^k, mentioned at [1]; for k = 100, it takes 0.393 s on the
> NFA and 0.168 on the DFA on a not so powerful computer. Otherwise, it is not
> optimized and fast in absolute terms: I use standard C++ containers such as
> unordered_set for DFA state lookups, and not a number.
> 
> 1. https://swtch.com/~rsc/regexp/regexp1.html

Sorry, I meant to observe whether the example triggers a O(k^2) behavior. It
should be less than quadratic. To measure this, you don't have to compare the
time of your prototype against libstdc++.

I set (kStart, kEnd) to be (1000, 2000), (1000, 3000), ..., (1000, 9000) to see
how the time grows. I changed my pattern a bit to make it k lg k. I'm more
curious about the time complexity your program takes, as I was not able to
understand your algorithm description.

Here are my numbers for end ranging 2000 to 9000, in ms:
[79,150,262,427,620,868,1078,1310].

The difference array: [71, 112, 165, 193, 248, 210, 232].

I think it's vaguely k lg k, but I'm not quite sure yet.

Here is the updated example, to re-arrange the expression to a tree, not a
chain of "|" operators.

#include <regex>
#include <string>
#include <iostream>
#include <cstdlib>
#include <cassert>

void gen_pattern(int start, int end, std::string& acc) {
  if (!(start < end)) {
    std::abort();
  }
  if (start + 1 == end) {
    acc += '(';
    acc += std::to_string(start);
    acc += ')';
    return;
  }
  int mid = (start + end) / 2;
  if (start + 1 == mid) {
    gen_pattern(start, mid, acc);
  } else {
    acc += "(?:";
    gen_pattern(start, mid, acc);
    acc += ")";
  }

  acc += "|";

  if (mid + 1 == end) {
    gen_pattern(mid, end, acc);
  } else {
    acc += "(?:";
    gen_pattern(mid, end, acc);
    acc += ")";
  }
}

int main(int argc, char** argv) {
  if (argc < 3) {
    std::abort();
  }
  int start = atoi(argv[1]);
  int end = atoi(argv[2]);

  std::regex re;
  {
    std::string pattern;
    gen_pattern(start, end, pattern);
    re.assign("(" + pattern + ")*");
  }
  std::string s;
  for (int i = start; i < end; i++) {
    s += std::to_string(i);
  }
  std::smatch m;
  std::cout << std::regex_match(s, m, re) << "\n";
  //for (const auto p : m) {
  //  std::cout << p << " ";
  //}
  //std::cout << "\n";
  return 0;
}

Reply via email to