Thanks! ) On Sunday, 30 November 2025 at 10:00:27 UTC+3 Robert Engels wrote:
> Very cool. > > On Nov 30, 2025, at 12:19 AM, [email protected] <[email protected]> wrote: > > Hi all, > > > > I know this thread is 10+ years old, but I wanted to follow up since > the regexp performance discussion is still highly relevant today. > > TL;DR: The situation has improved slightly over the years, but the > fundamental performance characteristics haven't changed dramatically. > So I built coregex - an alternative regex engine for Go that addresses > the performance issues discussed here. > > > What's Changed in Go stdlib (2013-2025) > ======================================== > > The good: > - Bug fixes and stability improvements > - Better Unicode handling > - Minor optimizations here and there > > The unchanged: > - Still uses Thompson's NFA exclusively > - No SIMD optimizations > - No prefilter strategies > - Same single-engine architecture > > Go's regexp prioritizes correctness and simplicity over raw performance. > That's a valid design choice - it guarantees O(n) time complexity and > prevents ReDoS attacks. But for regex-heavy workloads, the performance > gap vs other languages remains significant. > > > The Performance Gap Today (2025) > ================================= > > Benchmarking against Rust's regex crate on patterns like > .*error.*connection.*: > > - Go stdlib: 12.6ms (250KB input) > - Rust regex: ~20µs (same input) > - Gap: ~600x slower > > This isn't a criticism of Go - it's a different set of trade-offs. > But it shows the problem hasn't gone away. > > > What I Built: coregex > ===================== > > After hitting regex bottlenecks in production, I spent 6 months building > coregex - a drop-in replacement for Go's regexp. > > GitHub: https://github.com/coregx/coregex > > Architecture: > - Multi-engine strategy selection (DFA/NFA/specialized engines) > - SIMD-accelerated prefilters (AVX2 assembly) > - Bidirectional search for patterns like .*keyword.* > - Zero allocations in hot paths > > Performance (vs stdlib): > - 3-3000x faster depending on pattern > - Maintains O(n) guarantees (no backtracking) > - Drop-in API compatibility > > Real benchmarks: > > Pattern Input stdlib coregex Speedup > ------------------------------------------------------- > .*\.txt$ 1MB 27ms 21µs 1,314x > .*error.* 250KB 12.6ms 4µs 3,154x > (?i)error 32KB 1.23ms 4.7µs 263x > \w+@\w+\.\w+ 1KB 688ns 196ns 3.5x > > Status: v0.8.0 released, MIT licensed, 88% test coverage > > > Could This Go Into stdlib? > =========================== > > That's the interesting question. I've been thinking about this from > several angles: > > Challenges: > 1. Complexity - Multi-engine architecture is significantly more > complex than current implementation > 2. Maintenance burden - SIMD assembly needs platform-specific > variants (AVX2, NEON, etc.) > 3. Binary size - Multiple engines increase compiled binary size > 4. API stability - stdlib changes need extreme care > > Opportunities: > 1. Incremental adoption - Could start with just SIMD primitives > (internal/bytealg improvements) > 2. Opt-in optimizations - Keep current implementation as default, > offer regexp/fast package > 3. Strategy selection - Add smart path selection without breaking > existing code > 4. Knowledge transfer - Techniques from coregex could inform stdlib > improvements > > > What I'm Proposing > ================== > > Rather than a direct "merge coregex into stdlib" proposal, I'm suggesting: > > 1. Short term: Community uses coregex for performance-critical workloads > 2. Medium term: Discuss which techniques could benefit stdlib > (SIMD byte search, prefilters) > 3. Long term: Potential collaboration on stdlib improvements > (if there's interest) > > I'd be happy to: > - Help with stdlib patches for incremental improvements > - Share implementation learnings and benchmarks > - Discuss compatibility considerations > > > For Those Interested > ==================== > > Try it: > go get github.com/coregx/[email protected] > <http://github.com/coregx/[email protected]> > > Read more: > - Dev.to article: > > https://dev.to/kolkov/gos-regexp-is-slow-so-i-built-my-own-3000x-faster-3i6h > - GitHub repo: > https://github.com/coregx/coregex > - v0.8.0 release: > https://github.com/coregx/coregex/releases/tag/v0.8.0 > > Feedback welcome on: > - API compatibility issues > - Performance on your specific patterns > - Ideas for stdlib integration > > > The Bottom Line > =============== > > The regexp performance discussion from 10+ years ago was valid then and > remains valid now. The good news: we have options today. The better news: > maybe some of these ideas will make their way into stdlib eventually. > > In the meantime, coregex is production-ready and MIT-licensed. Use it if > it helps. > > Cheers, > Andrey Kolkov > GitHub: https://github.com/kolkov > CoreGX (Production Go Libraries): https://github.com/coregx > > On Thursday, 28 April 2011 at 18:13:21 UTC+4 Russ Cox wrote: > >> > In some areas Go kann keep up with Java but when it comes to string >> > operations ("regex-dna" benchmark), Go is even much slower than Ruby >> > or Python. Is the status quo going to improve anytime soon? And why is >> > Go so terribly slow when it comes to string/RegEx operations? >> >> You assume the benchmark is worth something. >> >> First of all, Ruby and Python are using C implementations >> of the regexp search, so Go is being beat by C, not by Ruby. >> >> Second, Go is using a different algorithm for regexp matching >> than the C implementations in those other languages. >> The algorithm Go uses guarantees to complete in time that is >> linear in the length of the input. The algorithm that Ruby/Python/etc >> are using can take time exponential in the length of the input, >> although on trivial cases it typically runs quite fast. >> In order to guarantee the linear time bound, Go's algorithm's >> best case speed a little slower than the optimistic Ruby/Python/etc >> algorithm. On the other hand, there are inputs for which Go will >> return quickly and Ruby/Python/etc need more time than is left >> before the heat death of the universe. It's a decent tradeoff. >> >> http://swtch.com/~rsc/regexp/regexp1.html >> >> Russ >> >> -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/golang-nuts/ba9bb686-3db1-4d5c-b92a-d5cdd9f6814cn%40googlegroups.com > > <https://groups.google.com/d/msgid/golang-nuts/ba9bb686-3db1-4d5c-b92a-d5cdd9f6814cn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/92c72fb2-afba-4648-a37d-5f68d4d142edn%40googlegroups.com.
