On Sat, Feb 13, 2021 at 06:19:34PM +0100, Joel Jacobson wrote:
> To test the correctness of the patches,
> I thought it would be nice with some real-life regexes,
> and just as important, some real-life text strings,
> to which the real-life regexes are applied to.
> 
> I therefore patched Chromium's v8 regexes engine,
> to log the actual regexes that get compiled when
> visiting websites, and also the text strings that
> are the regexes are applied to during run-time
> when the regexes are executed.
> 
> I logged the regex and text strings as base64 encoded
> strings to STDOUT, to make it easy to grep out the data,
> so it could be imported into PostgreSQL for analytics.
> 
> In total, I scraped the first-page of some ~50k websites,
> which produced 45M test rows to import,
> which when GROUP BY pattern and flags was reduced
> down to 235k different regex patterns,
> and 1.5M different text string subjects.

It's great to see this kind of testing.  Thanks for doing it.


Reply via email to