Hi everyone,

I'm Sepehr, the author of this proposal. I'm glad to see the interest in
grapheme_mask().

I have already developed a working prototype in C (based on ICU ubrk) along
with several PHPT test cases covering Unicode and emoji clusters. I believe
this addition will significantly improve how developers handle sensitive
data masking in modern PHP applications.

I have requested a Wiki account to start the formal RFC process and share
the implementation details.

Looking forward to your feedback.

Best regards,
Sepehr


در تاریخ جمعه ۱۹ ژوئن ۲۰۲۶، ۱۸:۱۷ youkidearitai <[email protected]>
نوشت:

> ‪2026年6月19日(金) 19:54 ‫سپهر محمودی‬‎ <[email protected]>:‬
> >
> >  Hello everyone
> > Over the past few weeks I have been exploring a common pattern that
> > frequently appears in PHP applications: masking sensitive parts of
> > strings such as credit card numbers, email addresses, phone numbers,
> > and personal identifiers.
> >
> > In many real-world codebases, developers typically implement masking
> > using combinations of functions like substr(), strlen(), str_repeat(),
> > substr_replace(), or their multibyte equivalents. While these
> > approaches work, they often lead to repetitive, error‑prone, and
> > sometimes inefficient user‑land implementations. Handling edge
> > cases—especially when offsets are negative, lengths are omitted, or
> > when working with Unicode text—can make these snippets unnecessarily
> > complex.
> >
> > While thinking about this problem, I designed a function concept
> > called grapheme_mask(). The goal of this function is to provide a
> > clear, native, and Unicode‑safe way to mask sections of a string.
> >
> > The key idea is that the function operates on grapheme clusters,
> > rather than raw bytes or individual code points. This allows it to
> > correctly handle modern Unicode text, including composed characters
> > and emoji sequences, without breaking them apart.
> >
> > Conceptually, the function replaces a range of grapheme clusters with
> > a masking string.
> >
> > Example:
> >
> > grapheme_mask("[email protected]", "*", 2, -12);
> > // result: se****@example.com
> > --------------------------------------------
> > Example with emoji sequences:
> > grapheme_mask("👨🏽‍👩‍👧‍👦 family", "*", 0, 1);
> > // result: * family
> > -----------------------------------------
> >
> > The intention is not to replace existing string functions, but to
> > provide a dedicated and expressive helper for a task that developers
> > routinely implement themselves.
> >
> > If there is interest from the community, I would be happy to draft a
> > formal RFC describing the proposed behavior, edge cases, and potential
> > implementation details.
> >
> > I would greatly appreciate any feedback, thoughts, or suggestions.
> >
> > Best regards,
> >
> > Sepehr
>
> Hi, Sepehr and Internals
>
> Thank you for bringing up discussion.
> Looks good to me.
>
> One more point for add that function.
> The diacritical mark sometimes includes one code point and separated
> code points.
> For example, Umlaut(ä, a + ¨), Dakuten(が, か + ゛) and etc in the world.
> These characters needs support for grapheme_mask function.
> Therefore, I would like need that function.
>
> Regards
> Yuya
>
>
> --
> ---------------------------
> Yuya Hamada (tekimen)
> - https://tekitoh-memdhoi.info
> - https://github.com/youkidearitai
> -----------------------------
>

Reply via email to