Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: b12a54f2b7ed3b3b3439cb7c295b30a65beedae6
      
https://github.com/Perl/perl5/commit/b12a54f2b7ed3b3b3439cb7c295b30a65beedae6
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M utf8.c

  Log Message:
  -----------
  utf8.c: Move declaration to first use


  Commit: 2dad945fd8f79a43c6fadb221a8e330587a9c846
      
https://github.com/Perl/perl5/commit/2dad945fd8f79a43c6fadb221a8e330587a9c846
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M utf8.c

  Log Message:
  -----------
  utf8.c: White-space only

Outdent and reflow some comments and code in preparation for them to be
moved out of the loop


  Commit: 47b98c3bb058803d892f867c8a50376677f21af7
      
https://github.com/Perl/perl5/commit/47b98c3bb058803d892f867c8a50376677f21af7
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M utf8.c

  Log Message:
  -----------
  utf8_to_bytes() Move failure code out of loop

This is for clarity.  All this very-unlikely-to-be-used code was in the
middle of what is really going on, creating a distraction.


  Commit: 9d310cca8256503836076bddf1c091088c9b2f8f
      
https://github.com/Perl/perl5/commit/9d310cca8256503836076bddf1c091088c9b2f8f
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M utf8.c

  Log Message:
  -----------
  utf8_to_bytes: Refactor loop

The previous version did not make sure that it wasn't reading beyond the
end of the buffer in all cases, and the first pass through the input
string already ruled out it having most problems.  Thus we don't need
the full generality here of the macro UTF8_IS_DOWNGRADEABLE_START; and
this simplifies things


  Commit: 774e86d896575f50bab84de2128818ef8b824f29
      
https://github.com/Perl/perl5/commit/774e86d896575f50bab84de2128818ef8b824f29
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M utf8.c

  Log Message:
  -----------
  utf8_to_bytes: Update and fix comments.

These were misleading.  On ASCII platforms, many calls to this function
won't use the per-word algorithm.  That's only done for long-enough
strings.


  Commit: 5137fe156b0947fa453abb5707ee4850249fac0f
      
https://github.com/Perl/perl5/commit/5137fe156b0947fa453abb5707ee4850249fac0f
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M utf8.c

  Log Message:
  -----------
  utf8_to_bytes: Rename variable

The new name, s0, is used in more other places for this meaning, and is
more descriptive.


  Commit: 6d61748c249cb3da09deeb41e8b1af9e9a118c91
      
https://github.com/Perl/perl5/commit/6d61748c249cb3da09deeb41e8b1af9e9a118c91
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M embed.fnc
    M embed.h
    M proto.h
    M utf8.c

  Log Message:
  -----------
  Add preliminary utf8_to_bytes_()

This is an internal function, designed to be an extension of
utf8_to_bytes(), with a slightly different API.  This commit just adds
it and calls it from just utf8_to_bytes.

Future commits will extend this API.


  Commit: a284efe05f92058a0ebda2a195a9c690b2681587
      
https://github.com/Perl/perl5/commit/a284efe05f92058a0ebda2a195a9c690b2681587
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M utf8.c

  Log Message:
  -----------
  utf8_to_bytes_: Add const

This variable should not be being changed by the function


  Commit: 96ac6902bb5218b9da29347ba47c773e0a44815f
      
https://github.com/Perl/perl5/commit/96ac6902bb5218b9da29347ba47c773e0a44815f
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M embed.fnc
    M embed.h
    M proto.h
    M utf8.c
    M utf8.h

  Log Message:
  -----------
  utf8_to_bytes_: Add argument, macro

The argument is currently unused.  The macro is a public facing API that
calls this function with the correct argument


  Commit: 5ebf77131512dc98d2c5f2245ad2c0be4cbfd056
      
https://github.com/Perl/perl5/commit/5ebf77131512dc98d2c5f2245ad2c0be4cbfd056
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M utf8.c

  Log Message:
  -----------
  utf8_to_bytes_: Slight refactor

This makes the next commit smaller


  Commit: 0a5edc8f161a24ea882cfef4df4f0e209dd12b57
      
https://github.com/Perl/perl5/commit/0a5edc8f161a24ea882cfef4df4f0e209dd12b57
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M embed.fnc
    M embed.h
    M proto.h
    M utf8.c
    M utf8.h

  Log Message:
  -----------
  utf8_to_bytes_: Add non-destructive write option

This causes this function to be able to both overwrite the input, and to
instead create new memory.  It changes bytes_from_utf8() to use this new
capability instead of being a near duplication of the core code of this
function.

Prior to this commit, bytes_from_utf8() just allocated memory the size
of the original string, and started copying into it.  When it came to a
sequence that wasn't convertible, it stopped, and freed up the copy.
The new behavior has it checking first before the malloc that the string
is convertible.  That has the advantage that there is no malloc without
being sure it will be useful; but the disadvantage that there is an
extra pass through the input string, but that pass is per-word.

The next commit will introduce another advantage.

Thanks to Tony Cook for the 'free_me' idea


  Commit: 8c15ff3a55aeaaf00accf47271f9cbb8b0ea6167
      
https://github.com/Perl/perl5/commit/8c15ff3a55aeaaf00accf47271f9cbb8b0ea6167
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M utf8.c

  Log Message:
  -----------
  utf8_to_bytes_: Calculate needed malloc size

Prior to this commit, the size malloced was just the same as the length
of the input string, which is a worst case scenario.  This commit
changes so the new pass through the input (introduced in the previous
commit) also calculates the needed length.

The additional cost of doing this is minimal.  It has advantages on a
very long string with lots of sequences that are convertible.


  Commit: 1b767c1fc0d77429e09ed03ef9fc386c9e85927e
      
https://github.com/Perl/perl5/commit/1b767c1fc0d77429e09ed03ef9fc386c9e85927e
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M embed.fnc
    M embed.h
    M proto.h
    M utf8.c
    M utf8.h

  Log Message:
  -----------
  utf8_to_bytes_: Add ability to return a mortalized pv

This is a non-destructive conversion of the input into native bytes, and
with any new memory required set for destruction via SAVEFREEPV.  This
allows the caller to not have to be concerned at all if memory was
created or not.

A new macro is created that calls this internal function with the
correct parameter to force this behavior.


  Commit: d126053ef36bfd2300b93c21167ef60a38456b73
      
https://github.com/Perl/perl5/commit/d126053ef36bfd2300b93c21167ef60a38456b73
  Author: Karl Williamson <[email protected]>
  Date:   2024-11-28 (Thu, 28 Nov 2024)

  Changed paths:
    M pod/perldelta.pod
    M utf8.c

  Log Message:
  -----------
  Document new utf8_to_bytes() variants


Compare: https://github.com/Perl/perl5/compare/6366047f7f43...d126053ef36b

To unsubscribe from these emails, change your notification settings at 
https://github.com/Perl/perl5/settings/notifications

Reply via email to