Re: [go-nuts] RegEx/string performance benchmarks

[email protected] Sun, 30 Nov 2025 04:21:54 -0800

Thanks! )
On Sunday, 30 November 2025 at 10:00:27 UTC+3 Robert Engels wrote:


> Very cool. 
>
> On Nov 30, 2025, at 12:19 AM, [email protected] <[email protected]> wrote:
>
> Hi all,
>
>
>
> I know this thread is 10+ years old, but I wanted to follow up since
> the regexp performance discussion is still highly relevant today.
>
> TL;DR: The situation has improved slightly over the years, but the
> fundamental performance characteristics haven't changed dramatically.
> So I built coregex - an alternative regex engine for Go that addresses
> the performance issues discussed here.
>
>
> What's Changed in Go stdlib (2013-2025)
> ========================================
>
> The good:
>   - Bug fixes and stability improvements
>   - Better Unicode handling
>   - Minor optimizations here and there
>
> The unchanged:
>   - Still uses Thompson's NFA exclusively
>   - No SIMD optimizations
>   - No prefilter strategies
>   - Same single-engine architecture
>
> Go's regexp prioritizes correctness and simplicity over raw performance.
> That's a valid design choice - it guarantees O(n) time complexity and
> prevents ReDoS attacks. But for regex-heavy workloads, the performance
> gap vs other languages remains significant.
>
>
> The Performance Gap Today (2025)
> =================================
>
> Benchmarking against Rust's regex crate on patterns like 
> .*error.*connection.*:
>
>   - Go stdlib: 12.6ms (250KB input)
>   - Rust regex: ~20µs (same input)
>   - Gap: ~600x slower
>
> This isn't a criticism of Go - it's a different set of trade-offs.
> But it shows the problem hasn't gone away.
>
>
> What I Built: coregex
> =====================
>
> After hitting regex bottlenecks in production, I spent 6 months building
> coregex - a drop-in replacement for Go's regexp.
>
> GitHub: https://github.com/coregx/coregex
>
> Architecture:
>   - Multi-engine strategy selection (DFA/NFA/specialized engines)
>   - SIMD-accelerated prefilters (AVX2 assembly)
>   - Bidirectional search for patterns like .*keyword.*
>   - Zero allocations in hot paths
>
> Performance (vs stdlib):
>   - 3-3000x faster depending on pattern
>   - Maintains O(n) guarantees (no backtracking)
>   - Drop-in API compatibility
>
> Real benchmarks:
>
>   Pattern              Input   stdlib    coregex   Speedup
>   -------------------------------------------------------
>   .*\.txt$            1MB     27ms      21µs      1,314x
>   .*error.*           250KB   12.6ms    4µs       3,154x
>   (?i)error           32KB    1.23ms    4.7µs     263x
>   \w+@\w+\.\w+        1KB     688ns     196ns     3.5x
>
> Status: v0.8.0 released, MIT licensed, 88% test coverage
>
>
> Could This Go Into stdlib?
> ===========================
>
> That's the interesting question. I've been thinking about this from
> several angles:
>
> Challenges:
>   1. Complexity - Multi-engine architecture is significantly more
>      complex than current implementation
>   2. Maintenance burden - SIMD assembly needs platform-specific
>      variants (AVX2, NEON, etc.)
>   3. Binary size - Multiple engines increase compiled binary size
>   4. API stability - stdlib changes need extreme care
>
> Opportunities:
>   1. Incremental adoption - Could start with just SIMD primitives
>      (internal/bytealg improvements)
>   2. Opt-in optimizations - Keep current implementation as default,
>      offer regexp/fast package
>   3. Strategy selection - Add smart path selection without breaking
>      existing code
>   4. Knowledge transfer - Techniques from coregex could inform stdlib
>      improvements
>
>
> What I'm Proposing
> ==================
>
> Rather than a direct "merge coregex into stdlib" proposal, I'm suggesting:
>
>   1. Short term: Community uses coregex for performance-critical workloads
>   2. Medium term: Discuss which techniques could benefit stdlib
>      (SIMD byte search, prefilters)
>   3. Long term: Potential collaboration on stdlib improvements
>      (if there's interest)
>
> I'd be happy to:
>   - Help with stdlib patches for incremental improvements
>   - Share implementation learnings and benchmarks
>   - Discuss compatibility considerations
>
>
> For Those Interested
> ====================
>
> Try it:
>   go get github.com/coregx/[email protected] 
> <http://github.com/coregx/[email protected]>
>
> Read more:
>   - Dev.to article:
>     
> https://dev.to/kolkov/gos-regexp-is-slow-so-i-built-my-own-3000x-faster-3i6h
>   - GitHub repo:
>     https://github.com/coregx/coregex
>   - v0.8.0 release:
>     https://github.com/coregx/coregex/releases/tag/v0.8.0
>
> Feedback welcome on:
>   - API compatibility issues
>   - Performance on your specific patterns
>   - Ideas for stdlib integration
>
>
> The Bottom Line
> ===============
>
> The regexp performance discussion from 10+ years ago was valid then and
> remains valid now. The good news: we have options today. The better news:
> maybe some of these ideas will make their way into stdlib eventually.
>
> In the meantime, coregex is production-ready and MIT-licensed. Use it if
> it helps.
>
> Cheers,
> Andrey Kolkov
> GitHub: https://github.com/kolkov
> CoreGX (Production Go Libraries): https://github.com/coregx
>
> On Thursday, 28 April 2011 at 18:13:21 UTC+4 Russ Cox wrote:
>
>> > In some areas Go kann keep up with Java but when it comes to string
>> > operations ("regex-dna" benchmark), Go is even much slower than Ruby
>> > or Python. Is the status quo going to improve anytime soon? And why is
>> > Go so terribly slow when it comes to string/RegEx operations?
>>
>> You assume the benchmark is worth something.
>>
>> First of all, Ruby and Python are using C implementations
>> of the regexp search, so Go is being beat by C, not by Ruby.
>>
>> Second, Go is using a different algorithm for regexp matching
>> than the C implementations in those other languages.
>> The algorithm Go uses guarantees to complete in time that is
>> linear in the length of the input. The algorithm that Ruby/Python/etc
>> are using can take time exponential in the length of the input,
>> although on trivial cases it typically runs quite fast.
>> In order to guarantee the linear time bound, Go's algorithm's
>> best case speed a little slower than the optimistic Ruby/Python/etc
>> algorithm. On the other hand, there are inputs for which Go will
>> return quickly and Ruby/Python/etc need more time than is left
>> before the heat death of the universe. It's a decent tradeoff.
>>
>> http://swtch.com/~rsc/regexp/regexp1.html
>>
>> Russ
>>
>> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion visit 
> https://groups.google.com/d/msgid/golang-nuts/ba9bb686-3db1-4d5c-b92a-d5cdd9f6814cn%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/golang-nuts/ba9bb686-3db1-4d5c-b92a-d5cdd9f6814cn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/92c72fb2-afba-4648-a37d-5f68d4d142edn%40googlegroups.com.

Re: [go-nuts] RegEx/string performance benchmarks

Reply via email to