2024年10月6日(日) 14:45 youkidearitai <youkideari...@gmail.com>: > > 2024年10月5日(土) 1:20 Tim Düsterhus <t...@bastelstu.be>: > > > > Hi > > > > Am 2024-09-25 09:21, schrieb youkidearitai: > > > I tried implement mb_levenshtein function and create an RFC. > > > https://wiki.php.net/rfc/mb_levenshtein > > > https://github.com/php/php-src/pull/16043 > > > > > > I would like discussion, feel free to comment. > > > > Thank you for your RFC. I share the concern raised by cmb in the PR > > discussion: > > https://github.com/php/php-src/pull/16043#issuecomment-2374574538 > > > > Generally working with codepoints is going to be confusing for a user, > > but sometimes it is necessary when dealing with external systems that > > themselves work with codepoints (MySQL comes to my mind). However > > calculating the Levenshtein distance is most certainly something that > > purely is "user-facing" and not constrained by external systems. > > Calculating the distance of codepoints is going to be extremely > > confusing when dealing with things like Emoji. It would probably best to > > either only offer a `grapheme_*` function here or to leave this fully to > > userland. > > > > Best regards > > Tim Düsterhus > > Hi, Tim > > Thank you for response. > I thinking about wants users what is levenshtein distance. > Surely, I think Levenshtein distance should be measured in terms of > grapheme clusters. > > In most userland codes that based on UTF-8. So seems move to grapheme > function is make sense. > I more thinking usecase of levenshtein. Probably I'm going to grapheme > function. > > Thanks > Yuya > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > -----------------------------
Hi, internals I'm thinking more about use case of mb_levenshtein. I added test case of mb_levenshtein that compare emoji per code point. https://github.com/php/php-src/pull/16043/files#diff-d6aca000d2b0ac5982f9f9a0fe0425246cfd8411fdfb8645cdfe6f786d526597R86 It means make sense to compare Unicode codepoint. I think need mb_levenshtein, and also needs grapheme_levenshtein. What do you think? Regards Yuya -- --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------