On Sat, Feb 13, 2021 at 06:19:34PM +0100, Joel Jacobson wrote: > To test the correctness of the patches, > I thought it would be nice with some real-life regexes, > and just as important, some real-life text strings, > to which the real-life regexes are applied to. > > I therefore patched Chromium's v8 regexes engine, > to log the actual regexes that get compiled when > visiting websites, and also the text strings that > are the regexes are applied to during run-time > when the regexes are executed. > > I logged the regex and text strings as base64 encoded > strings to STDOUT, to make it easy to grep out the data, > so it could be imported into PostgreSQL for analytics. > > In total, I scraped the first-page of some ~50k websites, > which produced 45M test rows to import, > which when GROUP BY pattern and flags was reduced > down to 235k different regex patterns, > and 1.5M different text string subjects.
It's great to see this kind of testing. Thanks for doing it.