-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Chris,
On 2/28/20 13:25, Chris Cheshire wrote: > On Fri, Feb 28, 2020 at 12:51 PM Christopher Schultz > <ch...@christopherschultz.net> wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 >> >> Chris and Mark, >> >> On 2/28/20 11:51, Mark Thomas wrote: >>> On 28/02/2020 14:51, Chris Cheshire wrote: >>>> (9.0.31) >>>> >>>> What is the reason why the pattern isn't compiled with the >>>> case insensitive flag? Is it due to performance? >>> >>> I wrote that Valve. At least the first iteration anyway. >>> Others improved it along the way. >>> >>> I honestly can't remember why I opted for [bB]bot rather than >>> using CASE_INSENSITIVE. >>> >>> I do remember that the focus was on fixing an issue we (the >>> ASF) were having with our public Jira instance at the time in >>> that bots were generating huge numbers of sessions and, in >>> turn, using up large amounts of memory. >>> >>> Looking at it with the benefit of hindsight I'd worry about: - >>> performance - avoiding false positives >>> >>> There probably isn't much in it but I'd expect the current >>> solution is the right one for both of those. Unless you have a >>> very different UA pattern, in which case CASE_INSENSITIVE might >>> help. But I am guessing about the performance which really >>> isn't the done thing. >>> >>> If someone was to demonstrate that there was a measurable >>> performance benefit to some realistic patterns to using >>> CASE_INSENSITIVE then I'd support an enhancement to add an >>> attribute to specify the flags to use when compiling the >>> pattern. >> > > More of a curiosity. I am doing some crawler checking in my webapp. > I have an grossly repetitive regex and I was looking at this valve > as an example to optimize things a bit. I figured if it was a CI > check then it would negate the need for patterns like [bB]. > > There are a couple of common patterns that it is leaving out though > : > > .*[sS]p[iy]der.* .*facebookexternalhit.* > .*(Mediapartners|Feedfetcher)-[gG]oogle.* > > (last one is adding 'mediapartners' to the subpattern already in > your default regex) > >> You can always use the (?i) flag-enabler if you want to use >> case-insensitive matches without changing the code. >> > > +1 Did not know about this! If the flags can be specified in the > pattern itself, then there probably isn't much need for adding > extra attributes to the valve to achieve it. > > Java regex tutorial[1] does say there is a slight performance hit > for a CI check, but it's not quantified. With processing speed > increases, my guess is it is completely negligible per request. Case-insensitive checks are probably pretty quick unless you start to get into Unicode casing and locale-specific casing. Converting [A-Z] - -> [a-z] is a simple comparison of byte value and adding the value of ('A' - 'a'). If you need to be able to convert Б to lowercase before performing that comparison, it requires ... some additional effort. See the documentation for the (?u) or UNICODE_CASE flag. - -chris -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl5ZkmwACgkQHPApP6U8 pFiKmRAAs2hm7q/kE1OU9IU5bMUVX5gjI3YAVjvn0NhNglYlYp3p11mIQ0YAB4wl kS+kONvTXKhI0pZiEovwPizUnOkTDnAK3S6cC5NFWeK1JJfOII6MrUW4hXBQimUf t4kNXnjXuf1/2k7kOcvAcxqx+ORCwyRemA/+U7T9IGMYeodfyIKQps3sZk6ClZvP hC+GO2tGbR825e64IXK4ZPouoTTparsBo1j6CNe3ZlrAFLzUeqAmqwFRY0EPk7cW AmGvX7X54AilZhzD+xrXFOUY0+V1B05qLoQYsm9j8UwSHt1a8dEMWv/Bauzwulbu gRYnEIJeTueSSfTOO6vneAexHf5WqfG+sFgbPMsvCKvjgUywDtiiadzYE/TN1/z5 ZwkS6uvv8dpahVlzE3z12HlEGhQ0vc0Y/p+p5cwDNWTNVFS434Zxu8OplpfVREaB fybKmCAD32ENR8KcH5fY7C25hLRPo8d1TZK9VuTSj+fhJHrStE6o0Opln6JOhjsX rL/KPj6dma623PaH5RKSDWGtyYzI49rjLQVEA2Qw/eIh8h32Gts3DtE6uYDLDApS 6t7ELYxI0w8JCYUiFeRstmmGUYOBk0kdH0yfkP9Wc0CWInDCZvI7kI1AYRxtMdVK Y7mrPKxOthPyyYE9LjfhnHV+LLwRk7AknxWtrRwTzzUGJhOkR24= =cprm -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org