Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 34b0b047bb64f93ccd1b003d410e0f8b4c9d681b
      
https://github.com/WebKit/WebKit/commit/34b0b047bb64f93ccd1b003d410e0f8b4c9d681b
  Author: David Degazio <d_dega...@apple.com>
  Date:   2024-06-27 (Thu, 27 Jun 2024)

  Changed paths:
    A JSTests/microbenchmarks/regexp-match-alphanumeric.js
    A JSTests/microbenchmarks/regexp-match-multiple-single-chars.js
    A JSTests/microbenchmarks/regexp-match-separators.js
    M Source/JavaScriptCore/assembler/MacroAssembler.h
    M Source/JavaScriptCore/assembler/MacroAssemblerARM64.h
    M Source/JavaScriptCore/assembler/MacroAssemblerARMv7.h
    M Source/JavaScriptCore/assembler/MacroAssemblerRISCV64.h
    M Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h
    M Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h
    M Source/JavaScriptCore/yarr/YarrJIT.cpp

  Log Message:
  -----------
  [JSC] Use immediate bit-vectors for character class matching in YarrJIT
https://bugs.webkit.org/show_bug.cgi?id=275279
rdar://129419939

Reviewed by Michael Saboff.

Changes how YarrJIT handles character class matches via the following:

 1. Optimize single-range checks from two branches into subtract + branch.

 2. Use a bit-vector test to quickly match a set of individual characters,
    as opposed to the current strategy of O(n) sequential equality checks.

 3. Make the logic of matchCharacterClassRange more recursive. We use the
    optimized single-range test if there is only a single range, and use
    the new bit-vector test if the whole set of ranges and character matches
    fits within a small-enough range. Moreover, the binary search is now
    totally recursive, meaning we can use these specialized checks for
    recursive checks within the binary search too, whereas currently binary
    search is kind of all-or-nothing.

 4. A few small optimizations are removed - YarrJIT no longer special-cases
    ASCII letters in character class matches, since character set matching
    is now faster. Turning adjacent character matches into length-two ranges
    is also removed during CharacterClass construction since this doesn't
    really do anything other than make the binary search do extra work (I'd
    be really surprised if this was ever particularly profitable).

Overall, this seems to be a somewhat modest but appreciable perf win on
microbenchmarks. On the added ASCII alphanumeric test I'm seeing about 10%
improvement with this new approach, and on the single-chars test I'm seeing
more like 20% improvement. I've added a test for a set of separator chars
too, and we have maybe a small ~2% improvement on my machine - this is pretty
small and hopefully improvable? Not so exciting, but let's have the
microbenchmark in the tree anyway.

* JSTests/microbenchmarks/regexp-match-alphanumeric.js: Added.
* JSTests/microbenchmarks/regexp-match-multiple-single-chars.js: Added.
* JSTests/microbenchmarks/regexp-match-separators.js: Added.
(let.src):
(dot):
(test):
(i.let.re):
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
* Source/JavaScriptCore/yarr/YarrPattern.cpp:
(JSC::Yarr::CharacterClassConstructor::addSorted):

Canonical link: https://commits.webkit.org/280425@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications
_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to