On Sun, 22 Dec 2019 16:57:12 -0800 Paul Eggert <egg...@cs.ucla.edu> wrote:
> On 11/3/18 9:25 PM, Norihiro Tanaka wrote: > > > $ seq -f '%040g' 0 9999 | sed '1s/$/\\(0\\)\\1/' >pat > > Thanks for the test case and sorry about the delay. And thanks for spotting > the > speedup opportunity. I found a few problems with the proposed patch, though: > > > + if (keys[1] == '\\') > > + keys++; > > This mishandles the case where the input byte sequence contains '\', '\', '1' > where the first '\' is the last byte of a multibyte character. Such a byte > sequence can contain backreferences but the function will say it doesn't. > > > + if (backref && prev < p) > > + { > > + len = p - prev; > > + buf = xrealloc (buf, (buflen + len) * sizeof *buf); > > + memcpy (buf + buflen, p, len); > > + buflen += len; > > + } > > This seems to have three problems. First, the memcpy copies from P, but it > should copy from PREV. Second, this code assigns to LEN, which breaks a later > use of LEN. Third, if there are many patterns with backreferences, repeated > use > of realloc will have O(N**2) behavior. > > > + for (size_t i = 0; i < dc->pcount; i++) > > + dc->patterns[i + 1] = dc->patterns[i]; > > This copies dc->patterns[0] to the later values in that array, when a memmove > was intended. > > So, after installing the patch, I immediately installed the attached patch, > which should address the abovementioned issues. > > Thanks again. You did the hard work - I merely proofread it. Sorry and thanks for the review and fixing. possible_backrefs_in_pattern is greatful! I can't find any solution better than it.