### Description In Unicode, some CJK characters such as 化 have one codepoint but will appear differently in Simplified Chinese (<span lang="zh-Hans">化</span>), Traditional Chinese (<span lang="zh-Hant">化</span>), and Japanese (<span lang="ja">化</span>). On the frontend, we can display names correctly using an HTML attribute such as `lang="zh-Hant"` This issue is known as [Han unification](https://en.wikipedia.org/wiki/Han_unification) and it has appeared over the years [in many software projects](https://issues.chromium.org/issues/41315603)
This was addressed in iD https://github.com/openstreetmap/iD/pull/10716 and is a long-running discussion in openstreetmap-carto. If we add `&addressdetails=1` to Nominatim queries, we can read the country_code and display the best label for mainland China, Hong Kong, Japan, or Taiwan. ### How has this been tested? This can be tricky to test, as **many names do not change**, and the display_name will be in your browser's language if it's available - Search results will have a lang tag, such as `lang="zh-HK"` or `lang="ja"`, regardless of language of display_name - In Taiwan, a search result for <span lang="zh-Hant">彰化</span> should show a horizontal bar in <span lang="zh-Hant">化</span> - In mainland China, a search result for <span lang="zh-Hans">玉门 expressway</span> should return a split frame <span lang="zh-Hans">门</span> in the second character, not the 门 with a + ### Notes As an alternative to adding `&addressdetails=1` to queries, we could possibly parse display_name (varies with the browser language) or use geo bounding boxes? This matching of languages is imperfect, but without a language tag we are always using your browser's default for any CJK character. It would be difficult to make exceptions (for example, Japanese restaurants in these countries) without a name regex, a language tag, or access to other tags This does not affect Chinese names in other countries I have heard that there are some variations for Cyrillic in [Bulgaria](https://en.wikipedia.org/wiki/Bulgarian_alphabet) and [Serbia](https://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet#Differences_from_other_Cyrillic_alphabets), particularly in italics? But I don't know how universal it is. [Additional info](https://commons.wikimedia.org/wiki/File:Special_Cyrillics_BGDPT.svg) You can view, comment on, or merge this pull request online at: https://github.com/openstreetmap/openstreetmap-website/pull/6079 -- Commit Summary -- * add lang attribute to results from CJK countries, plus Cyrillic * remove Bulgaria/Serbia for now * fix HK subregion -- File Changes -- M app/controllers/concerns/nominatim_methods.rb (2) M app/controllers/searches/nominatim_queries_controller.rb (7) M app/helpers/geocoder_helper.rb (2) -- Patch Links -- https://github.com/openstreetmap/openstreetmap-website/pull/6079.patch https://github.com/openstreetmap/openstreetmap-website/pull/6079.diff -- Reply to this email directly or view it on GitHub: https://github.com/openstreetmap/openstreetmap-website/pull/6079 You are receiving this because you are subscribed to this thread. Message ID: <openstreetmap/openstreetmap-website/pull/6...@github.com>
_______________________________________________ rails-dev mailing list rails-dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/rails-dev