[Perl/perl5] 7e81f1: Define MAX_UNICODE_UTF8_BYTES

Karl Williamson via perl5-changes Tue, 30 Sep 2025 10:01:44 -0700

  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: 7e81f1903c03c7e53fbb9d1a221d1865ac370f88
      
https://github.com/Perl/perl5/commit/7e81f1903c03c7e53fbb9d1a221d1865ac370f88
  Author: Karl Williamson <[email protected]>
  Date:   2025-09-30 (Tue, 30 Sep 2025)


  Changed paths:
    M utf8.h

  Log Message:
  -----------
  Define MAX_UNICODE_UTF8_BYTES

This value is the maximum number of bytes required to represent in UTF-8
any code point in the legal Unicode range of 0 .. 0x10FFFF


  Commit: 8785c114b5d618cb4ac8a6ffd5f41eff9c585810
      
https://github.com/Perl/perl5/commit/8785c114b5d618cb4ac8a6ffd5f41eff9c585810
  Author: Karl Williamson <[email protected]>
  Date:   2025-09-30 (Tue, 30 Sep 2025)

  Changed paths:
    M parser.h
    M t/comp/parser.t

  Log Message:
  -----------
  parser.h Allow up to 256 characters in a token

This is already the claimed allowed length.  But that is a lie, until
this commit.  Instead, the buffer has been 256 bytes long, which means,
we can have 256 1-byte characters in an identifier; but only 128 2-byte
ones, etc.  Unicode can have 4-byte identifier characters, so our limit
has really been just 64 for those.

The direction perl is supposed to be going, according to perldiag, is to
eliminate any identifier length limit.  I don't feel the urge to do that
now, but simply increasing the buffer size to accommodate any 256
Unicode identifier characters causes us to meet our claim.

The trickiest part of this by far was to get parser.t to pass, which
contrary to perldiag, tests very specifically about identifiers just shy
of 256.

One thing it does is to create a long string.  I just replaced every
character in it by 4 repeats, and then split into shorter lines.


Compare: https://github.com/Perl/perl5/compare/62e15056aec6...8785c114b5d6

To unsubscribe from these emails, change your notification settings at 
https://github.com/Perl/perl5/settings/notifications

[Perl/perl5] 7e81f1: Define MAX_UNICODE_UTF8_BYTES

Reply via email to