On Oct 17, 2015 12:53 AM, "Tom Lane" <t...@sss.pgh.pa.us> wrote: > > I've occasionally heard complaints that our regex engine only has > lookahead constraints not lookbehind constraints, while Perl's for example > does have those. While I was fooling around with the other issues in that > code, I learned enough to realize that it would not be that hard to > implement such things, and the attached proof-of-concept patch does so > (using Perl-compatible syntax, in case you're wondering). > > However, I don't feel like this is done to the point of being committable, > because its performance leaves something to be desired in a lot of cases. > The difficulty is that because the engine can only match in a forward > direction, in the worst case you have to check a lookbehind constraint by > trying to match starting from the string start to see if a match exists > ending at the current-location where you're testing the constraint. That > makes it, worst case, O(N^2) to search a string of length N. Now, you can > improve on that greatly if you can determine that possible matches don't > extend all the way back to the string start. In the attached patch I use > the "cold start pointer" returned by shortest() to decide that characters > before the coldstart point no longer have to be rechecked. That helps > some, but it's not good enough because there are too many cases where the > engine doesn't move the coldstart point forward very aggressively. > > It might be that that behavior itself can be improved on, which would > be nice because there are also non-lookbehind-related scenarios where > smarter coldstart detection would help performance. But I have no very > good ideas about how to do it. > > Another idea I've been toying with is to add logic that tries to determine > the maximum possible match length for a regex. If we can determine that > for the lookbehind sub-regex, then we'd know we have to back up at most > that far before applying the match check. This seems promising because > a lot of other engines don't even support variable-length lookbehind > patterns at all, and so we'd be assured of good performance in all the > cases that competitors can handle at all. > > Anyway, I'm not planning to do much more work on this right now, but > I thought I'd throw it out there just to let people know that this seems > within reach. I'm curious to know how many people care, and how much.
+1 I'm one of those who wanted lookbehind, and it would give us complete lookaround so very much welcome. Thom