https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
--- Comment #16 from Tim Shen <timshen at gcc dot gnu.org> --- (In reply to Hans Ã…berg from comment #15) > (In reply to Tim Shen from comment #14) > > How fast does your prototype run on the following example? > > ((10)|(11)|(12)|...|(98)|(99))* matching "10111213141516...9899" > > how about this: > > ((100)|(101)|(102)|...|(998)|(999))* matching "100101102...998999" > > > > where "..." are the skipped strings that follow the surrounding pattern. > > My program does not choke on limit cases, like the regex '(a?)^k a^k' on the > string string "a"^k, mentioned at [1]; for k = 100, it takes 0.393 s on the > NFA and 0.168 on the DFA on a not so powerful computer. Otherwise, it is not > optimized and fast in absolute terms: I use standard C++ containers such as > unordered_set for DFA state lookups, and not a number. > > 1. https://swtch.com/~rsc/regexp/regexp1.html Sorry, I meant to observe whether the example triggers a O(k^2) behavior. It should be less than quadratic. To measure this, you don't have to compare the time of your prototype against libstdc++. I set (kStart, kEnd) to be (1000, 2000), (1000, 3000), ..., (1000, 9000) to see how the time grows. I changed my pattern a bit to make it k lg k. I'm more curious about the time complexity your program takes, as I was not able to understand your algorithm description. Here are my numbers for end ranging 2000 to 9000, in ms: [79,150,262,427,620,868,1078,1310]. The difference array: [71, 112, 165, 193, 248, 210, 232]. I think it's vaguely k lg k, but I'm not quite sure yet. Here is the updated example, to re-arrange the expression to a tree, not a chain of "|" operators. #include <regex> #include <string> #include <iostream> #include <cstdlib> #include <cassert> void gen_pattern(int start, int end, std::string& acc) { if (!(start < end)) { std::abort(); } if (start + 1 == end) { acc += '('; acc += std::to_string(start); acc += ')'; return; } int mid = (start + end) / 2; if (start + 1 == mid) { gen_pattern(start, mid, acc); } else { acc += "(?:"; gen_pattern(start, mid, acc); acc += ")"; } acc += "|"; if (mid + 1 == end) { gen_pattern(mid, end, acc); } else { acc += "(?:"; gen_pattern(mid, end, acc); acc += ")"; } } int main(int argc, char** argv) { if (argc < 3) { std::abort(); } int start = atoi(argv[1]); int end = atoi(argv[2]); std::regex re; { std::string pattern; gen_pattern(start, end, pattern); re.assign("(" + pattern + ")*"); } std::string s; for (int i = start; i < end; i++) { s += std::to_string(i); } std::smatch m; std::cout << std::regex_match(s, m, re) << "\n"; //for (const auto p : m) { // std::cout << p << " "; //} //std::cout << "\n"; return 0; }