They don't have to be *arbitrarily* long. They actually can be quite short.
2^n is large, even for relatively small n.
Again, I refer to the graphs in the blog post above. We are talking about
input length of less than 100 characters to get CPU churn of *minutes*.
That's not a huge limit for a search-box.

Yes, the compile-time is negligible - but it's negligible *because it's
linear in the input size*. And match-times are only small, because RE2 made
the conscious decision (for *exactly* the reasons I outline here) to forego
features that were very common and even expected of regexp-engines, so it
could guarantee a linear match time. That was novel at the time - almost no
engine did that and the most commonly used most certainly didn't. When
people in this thread say "regular expressions are great because they give
linear match time on the matched text size", they forget that most "regular
expression" engines don't actually give you that. They only give that, if
you specifically design with that in mind.

I also explained why "it is trivial to limit the input size to something a
user could input" is simply wrong, in practice. If you have exponential
growth, the difference between 20 and 30 characters (again, we are talking
search-terms here, it is not at all unreasonable to look for a 30 character
identifier) is three orders of magnitude. If the curve is too steep, you
can't reasonably separate between sensible input lengths from attacks. If
you get a nice, linear curve, that becomes trivial. Just multiply any
sensible limit you could think of by 100 and you're still going to end up
well within spec.

And even if we ignore the DoS-potential - just considering what that does
to your load-patterns, that some user's queries will be quasi free, while
others will be a million times or more as expensive is a good argument to
pay attention to this. Just from an operational POV.

Seriously - you should *absolutely* consider the algorithmic complexity of
functions you are calling with user-provided input.

On Mon, Jun 8, 2020 at 11:22 PM Robert Engels <reng...@ix.netcom.com> wrote:

> You can’t say it arbitrary regex inputted by a user then claim they can be
> arbitrarily long - these are incompatible. For any reasonable user input
> the compile time is negligible, and it is trivial to limit the input size
> to something a user could input.
>
> On Jun 8, 2020, at 2:27 PM, Axel Wagner <axel.wagner...@googlemail.com>
> wrote:
>
> 
> And that exact logic is what wouldn't work, if compilation or matching
> would be exponential, for example.
> Being able to say "as long as the inputs don't get astronomically long,
> the running time of the algorithm will be reasonable" is *exactly* the
> conclusion that a small complexity class gives you.
>
> On Mon, Jun 8, 2020 at 7:17 PM Robert Engels <reng...@ix.netcom.com>
> wrote:
>
>> If the input regex string is bounded by a typical user input box - the
>> compilation time will be negligible when compared to the runtimes.
>>
>> On Jun 8, 2020, at 11:07 AM, Axel Wagner <axel.wagner...@googlemail.com>
>> wrote:
>>
>> 
>> Oh, true, I overlooked that detail. But FTR, they clarify their question
>> subsequently and specifically ask whether compilation is linear in
>> expression size. No Must* in that clarification.
>>
>> On Mon, Jun 8, 2020 at 6:05 PM Thomas Bushnell, BSG <tbushn...@google.com>
>> wrote:
>>
>>> On Mon, Jun 8, 2020 at 12:02 PM Axel Wagner <
>>> axel.wagner...@googlemail.com> wrote:
>>>
>>>> On Mon, Jun 8, 2020 at 5:41 PM Thomas Bushnell, BSG <
>>>> tbushn...@google.com> wrote:
>>>>
>>>>> The OP was about MustCompile, so I think it's clear they are not using
>>>>> patterns passed in by external requests.
>>>>>
>>>>
>>>> I don't think that's clear at all. How do you assume patterns from
>>>> external sources can be matched, if not by compiling them?
>>>>
>>>
>>> "MustCompile is like Compile but panics if the expression cannot be
>>> parsed."
>>>
>>> Thomas
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAEkBMfHxYgCfPtLZoZFppmsZYmYZMts9%3Dk4Q4pKv7ygAiR5SnA%40mail.gmail.com.

Reply via email to