-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Chris,

On 2/28/20 13:25, Chris Cheshire wrote:
> On Fri, Feb 28, 2020 at 12:51 PM Christopher Schultz
> <ch...@christopherschultz.net> wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>>
>> Chris and Mark,
>>
>> On 2/28/20 11:51, Mark Thomas wrote:
>>> On 28/02/2020 14:51, Chris Cheshire wrote:
>>>> (9.0.31)
>>>>
>>>> What is the reason why the pattern isn't compiled with the
>>>> case insensitive flag? Is it due to performance?
>>>
>>> I wrote that Valve. At least the first iteration anyway.
>>> Others improved it along the way.
>>>
>>> I honestly can't remember why I opted for [bB]bot rather than
>>> using CASE_INSENSITIVE.
>>>
>>> I do remember that the focus was on fixing an issue we (the
>>> ASF) were having with our public Jira instance at the time in
>>> that bots were generating huge numbers of sessions and, in
>>> turn, using up large amounts of memory.
>>>
>>> Looking at it with the benefit of hindsight I'd worry about: -
>>> performance - avoiding false positives
>>>
>>> There probably isn't much in it but I'd expect the current
>>> solution is the right one for both of those. Unless you have a
>>> very different UA pattern, in which case CASE_INSENSITIVE might
>>> help. But I am guessing about the performance which really
>>> isn't the done thing.
>>>
>>> If someone was to demonstrate that there was a measurable
>>> performance benefit to some realistic patterns to using
>>> CASE_INSENSITIVE then I'd support an enhancement to add an
>>> attribute to specify the flags to use when compiling the
>>> pattern.
>>
>
> More of a curiosity. I am doing some crawler checking in my webapp.
> I have an grossly repetitive regex and I was looking at this valve
> as an example to optimize things a bit. I figured if it was a CI
> check then it would negate the need for patterns like [bB].
>
> There are a couple of common patterns that it is leaving out though
> :
>
> .*[sS]p[iy]der.* .*facebookexternalhit.*
> .*(Mediapartners|Feedfetcher)-[gG]oogle.*
>
> (last one is adding 'mediapartners' to the subpattern already in
> your default regex)
>
>> You can always use the (?i) flag-enabler if you want to use
>> case-insensitive matches without changing the code.
>>
>
> +1 Did not know about this! If the flags can be specified in the
> pattern itself, then there probably isn't much need for adding
> extra attributes to the valve to achieve it.
>
> Java regex tutorial[1] does say there is a slight performance hit
> for a CI check, but it's not quantified. With processing speed
> increases, my guess is it is completely negligible per request.

Case-insensitive checks are probably pretty quick unless you start to
get into Unicode casing and locale-specific casing. Converting [A-Z]
- -> [a-z] is a simple comparison of byte value and adding the value of
('A' - 'a'). If you need to be able to convert Б to lowercase before
performing that comparison, it requires ... some additional effort.

See the documentation for the (?u) or UNICODE_CASE flag.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl5ZkmwACgkQHPApP6U8
pFiKmRAAs2hm7q/kE1OU9IU5bMUVX5gjI3YAVjvn0NhNglYlYp3p11mIQ0YAB4wl
kS+kONvTXKhI0pZiEovwPizUnOkTDnAK3S6cC5NFWeK1JJfOII6MrUW4hXBQimUf
t4kNXnjXuf1/2k7kOcvAcxqx+ORCwyRemA/+U7T9IGMYeodfyIKQps3sZk6ClZvP
hC+GO2tGbR825e64IXK4ZPouoTTparsBo1j6CNe3ZlrAFLzUeqAmqwFRY0EPk7cW
AmGvX7X54AilZhzD+xrXFOUY0+V1B05qLoQYsm9j8UwSHt1a8dEMWv/Bauzwulbu
gRYnEIJeTueSSfTOO6vneAexHf5WqfG+sFgbPMsvCKvjgUywDtiiadzYE/TN1/z5
ZwkS6uvv8dpahVlzE3z12HlEGhQ0vc0Y/p+p5cwDNWTNVFS434Zxu8OplpfVREaB
fybKmCAD32ENR8KcH5fY7C25hLRPo8d1TZK9VuTSj+fhJHrStE6o0Opln6JOhjsX
rL/KPj6dma623PaH5RKSDWGtyYzI49rjLQVEA2Qw/eIh8h32Gts3DtE6uYDLDApS
6t7ELYxI0w8JCYUiFeRstmmGUYOBk0kdH0yfkP9Wc0CWInDCZvI7kI1AYRxtMdVK
Y7mrPKxOthPyyYE9LjfhnHV+LLwRk7AknxWtrRwTzzUGJhOkR24=
=cprm
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to