Hello everyone
Over the past few weeks I have been exploring a common pattern that
frequently appears in PHP applications: masking sensitive parts of
strings such as credit card numbers, email addresses, phone numbers,
and personal identifiers.
In many real-world codebases, developers typically implement masking
using combinations of functions like substr(), strlen(), str_repeat(),
substr_replace(), or their multibyte equivalents. While these
approaches work, they often lead to repetitive, error‑prone, and
sometimes inefficient user‑land implementations. Handling edge
cases—especially when offsets are negative, lengths are omitted, or
when working with Unicode text—can make these snippets unnecessarily
complex.
While thinking about this problem, I designed a function concept
called grapheme_mask(). The goal of this function is to provide a
clear, native, and Unicode‑safe way to mask sections of a string.
The key idea is that the function operates on grapheme clusters,
rather than raw bytes or individual code points. This allows it to
correctly handle modern Unicode text, including composed characters
and emoji sequences, without breaking them apart.
Conceptually, the function replaces a range of grapheme clusters with
a masking string.
Example:
grapheme_mask("[email protected]", "*", 2, -12);
// result: se****@example.com
--------------------------------------------
Example with emoji sequences:
grapheme_mask("👨🏽👩👧👦 family", "*", 0, 1);
// result: * family
-----------------------------------------
The intention is not to replace existing string functions, but to
provide a dedicated and expressive helper for a task that developers
routinely implement themselves.
If there is interest from the community, I would be happy to draft a
formal RFC describing the proposed behavior, edge cases, and potential
implementation details.
I would greatly appreciate any feedback, thoughts, or suggestions.
Best regards,
Sepehr