On 2020-Jan-23, Tom Lane wrote: > That particular case could be improved by stopping at a dash ... but > since this code is also used to match strings like "A.M.", we can't > just exclude punctuation in general. Breaking at whitespace seems > like a reasonable compromise.
Yeah, and there are cases where dashes are used in names -- browsing through glibc for example I quickly found Akan, for which the month names are: mon "Sanda-<U0186>p<U025B>p<U0254>n";/ "Kwakwar-<U0186>gyefuo";/ "Eb<U0254>w-<U0186>benem";/ and so on. Even whitespace is problematic for some languages, such as Afan, mon "Qunxa Garablu";/ "Naharsi Kudo";/ "Ciggilta Kudo";/ (etc) but I think whitespace-splitting is going to be more comprehensible in the vast majority of cases, even if not perfect. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services