On Oct 17, 2015 12:53 AM, "Tom Lane" <t...@sss.pgh.pa.us> wrote:
>
> I've occasionally heard complaints that our regex engine only has
> lookahead constraints not lookbehind constraints, while Perl's for example
> does have those.  While I was fooling around with the other issues in that
> code, I learned enough to realize that it would not be that hard to
> implement such things, and the attached proof-of-concept patch does so
> (using Perl-compatible syntax, in case you're wondering).
>
> However, I don't feel like this is done to the point of being committable,
> because its performance leaves something to be desired in a lot of cases.
> The difficulty is that because the engine can only match in a forward
> direction, in the worst case you have to check a lookbehind constraint by
> trying to match starting from the string start to see if a match exists
> ending at the current-location where you're testing the constraint.  That
> makes it, worst case, O(N^2) to search a string of length N.  Now, you can
> improve on that greatly if you can determine that possible matches don't
> extend all the way back to the string start.  In the attached patch I use
> the "cold start pointer" returned by shortest() to decide that characters
> before the coldstart point no longer have to be rechecked.  That helps
> some, but it's not good enough because there are too many cases where the
> engine doesn't move the coldstart point forward very aggressively.
>
> It might be that that behavior itself can be improved on, which would
> be nice because there are also non-lookbehind-related scenarios where
> smarter coldstart detection would help performance.  But I have no very
> good ideas about how to do it.
>
> Another idea I've been toying with is to add logic that tries to determine
> the maximum possible match length for a regex.  If we can determine that
> for the lookbehind sub-regex, then we'd know we have to back up at most
> that far before applying the match check.  This seems promising because
> a lot of other engines don't even support variable-length lookbehind
> patterns at all, and so we'd be assured of good performance in all the
> cases that competitors can handle at all.
>
> Anyway, I'm not planning to do much more work on this right now, but
> I thought I'd throw it out there just to let people know that this seems
> within reach.  I'm curious to know how many people care, and how much.

+1

I'm one of those who wanted lookbehind, and it would give us complete
lookaround so very much welcome.

Thom

Reply via email to