2024年10月6日(日) 14:45 youkidearitai <youkideari...@gmail.com>:
>
> 2024年10月5日(土) 1:20 Tim Düsterhus <t...@bastelstu.be>:
> >
> > Hi
> >
> > Am 2024-09-25 09:21, schrieb youkidearitai:
> > > I tried implement mb_levenshtein function and create an RFC.
> > > https://wiki.php.net/rfc/mb_levenshtein
> > > https://github.com/php/php-src/pull/16043
> > >
> > > I would like discussion, feel free to comment.
> >
> > Thank you for your RFC. I share the concern raised by cmb in the PR
> > discussion:
> > https://github.com/php/php-src/pull/16043#issuecomment-2374574538
> >
> > Generally working with codepoints is going to be confusing for a user,
> > but sometimes it is necessary when dealing with external systems that
> > themselves work with codepoints (MySQL comes to my mind). However
> > calculating the Levenshtein distance is most certainly something that
> > purely is "user-facing" and not constrained by external systems.
> > Calculating the distance of codepoints is going to be extremely
> > confusing when dealing with things like Emoji. It would probably best to
> > either only offer a `grapheme_*` function here or to leave this fully to
> > userland.
> >
> > Best regards
> > Tim Düsterhus
>
> Hi, Tim
>
> Thank you for response.
> I thinking about wants users what is levenshtein distance.
> Surely, I think Levenshtein distance should be measured in terms of
> grapheme clusters.
>
> In most userland codes that based on UTF-8. So seems move to grapheme
> function is make sense.
> I more thinking usecase of levenshtein. Probably I'm going to grapheme 
> function.
>
> Thanks
> Yuya
>
> --
> ---------------------------
> Yuya Hamada (tekimen)
> - https://tekitoh-memdhoi.info
> - https://github.com/youkidearitai
> -----------------------------

Hi, internals
I'm thinking more about use case of mb_levenshtein.
I added test case of mb_levenshtein that compare emoji per code point.
https://github.com/php/php-src/pull/16043/files#diff-d6aca000d2b0ac5982f9f9a0fe0425246cfd8411fdfb8645cdfe6f786d526597R86

It means make sense to compare Unicode codepoint.
I think need mb_levenshtein, and also needs grapheme_levenshtein.

What do you think?

Regards
Yuya

-- 
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-----------------------------

Reply via email to