On Fri, Feb 28, 2020 at 12:51 PM Christopher Schultz <ch...@christopherschultz.net> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Chris and Mark, > > On 2/28/20 11:51, Mark Thomas wrote: > > On 28/02/2020 14:51, Chris Cheshire wrote: > >> (9.0.31) > >> > >> What is the reason why the pattern isn't compiled with the case > >> insensitive flag? Is it due to performance? > > > > I wrote that Valve. At least the first iteration anyway. Others > > improved it along the way. > > > > I honestly can't remember why I opted for [bB]bot rather than > > using CASE_INSENSITIVE. > > > > I do remember that the focus was on fixing an issue we (the ASF) > > were having with our public Jira instance at the time in that bots > > were generating huge numbers of sessions and, in turn, using up > > large amounts of memory. > > > > Looking at it with the benefit of hindsight I'd worry about: - > > performance - avoiding false positives > > > > There probably isn't much in it but I'd expect the current solution > > is the right one for both of those. Unless you have a very > > different UA pattern, in which case CASE_INSENSITIVE might help. > > But I am guessing about the performance which really isn't the done > > thing. > > > > If someone was to demonstrate that there was a measurable > > performance benefit to some realistic patterns to using > > CASE_INSENSITIVE then I'd support an enhancement to add an > > attribute to specify the flags to use when compiling the pattern. >
More of a curiosity. I am doing some crawler checking in my webapp. I have an grossly repetitive regex and I was looking at this valve as an example to optimize things a bit. I figured if it was a CI check then it would negate the need for patterns like [bB]. There are a couple of common patterns that it is leaving out though : .*[sS]p[iy]der.* .*facebookexternalhit.* .*(Mediapartners|Feedfetcher)-[gG]oogle.* (last one is adding 'mediapartners' to the subpattern already in your default regex) > You can always use the (?i) flag-enabler if you want to use > case-insensitive matches without changing the code. > +1 Did not know about this! If the flags can be specified in the pattern itself, then there probably isn't much need for adding extra attributes to the valve to achieve it. Java regex tutorial[1] does say there is a slight performance hit for a CI check, but it's not quantified. With processing speed increases, my guess is it is completely negligible per request. Chris [1] https://docs.oracle.com/javase/tutorial/essential/regex/pattern.html --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org