Hi all,

I know this thread is 10+ years old, but I wanted to follow up since
the regexp performance discussion is still highly relevant today.

TL;DR: The situation has improved slightly over the years, but the
fundamental performance characteristics haven't changed dramatically.
So I built coregex - an alternative regex engine for Go that addresses
the performance issues discussed here.


What's Changed in Go stdlib (2013-2025)
========================================

The good:
  - Bug fixes and stability improvements
  - Better Unicode handling
  - Minor optimizations here and there

The unchanged:
  - Still uses Thompson's NFA exclusively
  - No SIMD optimizations
  - No prefilter strategies
  - Same single-engine architecture

Go's regexp prioritizes correctness and simplicity over raw performance.
That's a valid design choice - it guarantees O(n) time complexity and
prevents ReDoS attacks. But for regex-heavy workloads, the performance
gap vs other languages remains significant.


The Performance Gap Today (2025)
=================================

Benchmarking against Rust's regex crate on patterns like 
.*error.*connection.*:

  - Go stdlib: 12.6ms (250KB input)
  - Rust regex: ~20µs (same input)
  - Gap: ~600x slower

This isn't a criticism of Go - it's a different set of trade-offs.
But it shows the problem hasn't gone away.


What I Built: coregex
=====================

After hitting regex bottlenecks in production, I spent 6 months building
coregex - a drop-in replacement for Go's regexp.

GitHub: https://github.com/coregx/coregex

Architecture:
  - Multi-engine strategy selection (DFA/NFA/specialized engines)
  - SIMD-accelerated prefilters (AVX2 assembly)
  - Bidirectional search for patterns like .*keyword.*
  - Zero allocations in hot paths

Performance (vs stdlib):
  - 3-3000x faster depending on pattern
  - Maintains O(n) guarantees (no backtracking)
  - Drop-in API compatibility

Real benchmarks:

  Pattern              Input   stdlib    coregex   Speedup
  -------------------------------------------------------
  .*\.txt$            1MB     27ms      21µs      1,314x
  .*error.*           250KB   12.6ms    4µs       3,154x
  (?i)error           32KB    1.23ms    4.7µs     263x
  \w+@\w+\.\w+        1KB     688ns     196ns     3.5x

Status: v0.8.0 released, MIT licensed, 88% test coverage


Could This Go Into stdlib?
===========================

That's the interesting question. I've been thinking about this from
several angles:

Challenges:
  1. Complexity - Multi-engine architecture is significantly more
     complex than current implementation
  2. Maintenance burden - SIMD assembly needs platform-specific
     variants (AVX2, NEON, etc.)
  3. Binary size - Multiple engines increase compiled binary size
  4. API stability - stdlib changes need extreme care

Opportunities:
  1. Incremental adoption - Could start with just SIMD primitives
     (internal/bytealg improvements)
  2. Opt-in optimizations - Keep current implementation as default,
     offer regexp/fast package
  3. Strategy selection - Add smart path selection without breaking
     existing code
  4. Knowledge transfer - Techniques from coregex could inform stdlib
     improvements


What I'm Proposing
==================

Rather than a direct "merge coregex into stdlib" proposal, I'm suggesting:

  1. Short term: Community uses coregex for performance-critical workloads
  2. Medium term: Discuss which techniques could benefit stdlib
     (SIMD byte search, prefilters)
  3. Long term: Potential collaboration on stdlib improvements
     (if there's interest)

I'd be happy to:
  - Help with stdlib patches for incremental improvements
  - Share implementation learnings and benchmarks
  - Discuss compatibility considerations


For Those Interested
====================

Try it:
  go get github.com/coregx/[email protected]

Read more:
  - Dev.to article:
    
https://dev.to/kolkov/gos-regexp-is-slow-so-i-built-my-own-3000x-faster-3i6h
  - GitHub repo:
    https://github.com/coregx/coregex
  - v0.8.0 release:
    https://github.com/coregx/coregex/releases/tag/v0.8.0

Feedback welcome on:
  - API compatibility issues
  - Performance on your specific patterns
  - Ideas for stdlib integration


The Bottom Line
===============

The regexp performance discussion from 10+ years ago was valid then and
remains valid now. The good news: we have options today. The better news:
maybe some of these ideas will make their way into stdlib eventually.

In the meantime, coregex is production-ready and MIT-licensed. Use it if
it helps.

Cheers,
Andrey Kolkov
GitHub: https://github.com/kolkov
CoreGX (Production Go Libraries): https://github.com/coregx


========================================
ALTERNATIVE: SHORTER VERSION (if brevity needed)
========================================

Hi all,

Quick update on this 10+ year old regexp performance discussion:

Status quo: Go's regexp still prioritizes correctness over speed
            (good trade-off for stdlib).

Gap: Still ~100-1000x slower than Rust/C++ regex engines on complex
     patterns.

Solution: Built coregex - 3-3000x faster, drop-in replacement.
          https://github.com/coregx/coregex

Could it go into stdlib?
  - Possible incrementally (SIMD primitives, prefilters)
  - Happy to collaborate if there's interest
  - For now: use coregex if performance matters

Details:
  
https://dev.to/kolkov/gos-regexp-is-slow-so-i-built-my-own-3000x-faster-3i6h

Cheers,
Andrey
https://github.com/kolkov
On Thursday, 28 April 2011 at 18:13:21 UTC+4 Russ Cox wrote:

> > In some areas Go kann keep up with Java but when it comes to string
> > operations ("regex-dna" benchmark), Go is even much slower than Ruby
> > or Python. Is the status quo going to improve anytime soon? And why is
> > Go so terribly slow when it comes to string/RegEx operations?
>
> You assume the benchmark is worth something.
>
> First of all, Ruby and Python are using C implementations
> of the regexp search, so Go is being beat by C, not by Ruby.
>
> Second, Go is using a different algorithm for regexp matching
> than the C implementations in those other languages.
> The algorithm Go uses guarantees to complete in time that is
> linear in the length of the input. The algorithm that Ruby/Python/etc
> are using can take time exponential in the length of the input,
> although on trivial cases it typically runs quite fast.
> In order to guarantee the linear time bound, Go's algorithm's
> best case speed a little slower than the optimistic Ruby/Python/etc
> algorithm. On the other hand, there are inputs for which Go will
> return quickly and Ruby/Python/etc need more time than is left
> before the heat death of the universe. It's a decent tradeoff.
>
> http://swtch.com/~rsc/regexp/regexp1.html
>
> Russ
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/07afe38f-714f-457f-9255-96f22ea76067n%40googlegroups.com.

Reply via email to