On 24/11/2019 19:42, Joseph Wright wrote:
This has of course come up before, and I'd like to add to the expl3 case changers. However, I've not been able to track down any formal statement on the case mappings: are they in the UCD, some official publication, ...?

Joseph

Found the appropriate .xml files in the CLDR: see attached.

I plan to make some revisions to the expl3 case changer over the next month or two: I'll likely incorporate this information.

Joseph
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE supplementalData SYSTEM "../../common/dtd/ldmlSupplemental.dtd">
<!--
Copyright © 1991-2013 Unicode, Inc.
CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
For terms of use, see http://www.unicode.org/copyright.html
-->
<supplementalData>
	<version number="$Revision$"/>
	<transforms>
		<transform source="el" target="Title" direction="forward" alias="el-t-d0-title">
			<tRule><![CDATA[
::NFD();
# Remove \0301 following Greek, with possible intervening 0308 marks.
# [[:Greek:] & [:Ll:]] [\u0308]? { \u0301 → ;
# Make any string of letters after a cased letter be lower, with rules for sigma
[:cased:] [:case-ignorable:]* { Σ } [:case-ignorable:]* [:cased:] → σ;
[:cased:] [:case-ignorable:]* { Σ → ς;
[:cased:] [:case-ignorable:]* { (.) → &Any-Lower($1) ;
# Otherwise all lowercase go to upper (titlecase stay as is)
([:Lowercase:]) → &Any-Title($1) ;
::NFC();
			]]></tRule>
		</transform>
	</transforms>
</supplementalData>
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE supplementalData SYSTEM "../../common/dtd/ldmlSupplemental.dtd">
<!--
Copyright © 1991-2013 Unicode, Inc.
CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
For terms of use, see http://www.unicode.org/copyright.html
-->
<supplementalData>
	<version number="$Revision$"/>
	<transforms>
		<transform source="el" target="Upper" direction="forward" alias="el-t-d0-upper">
			<tRule>
# Copyright (C) 2011-2013, Apple Inc. and others. All Rights Reserved.
# Remove \0301 following Greek, with possible intervening 0308 marks.
::NFD();
# For uppercasing (not titlecasing!) remove all greek accents from greek letters.
# This is done in two groups, to account for canonical ordering.
[:Greek:] [^[:ccc=Not_Reordered:][:ccc=Above:]]*? { [\u0313\u0314\u0301\u0300\u0306\u0342\u0308\u0304] → ;
[:Greek:] [^[:ccc=Not_Reordered:][:ccc=Iota_Subscript:]]*? { \u0345 → ;
::NFC();
::Any-Upper();
			</tRule>
		</transform>
	</transforms>
</supplementalData>
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE supplementalData SYSTEM "../../common/dtd/ldmlSupplemental.dtd">
<!--
Copyright © 1991-2013 Unicode, Inc.
CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
For terms of use, see http://www.unicode.org/copyright.html
-->
<supplementalData>
	<version number="$Revision$"/>
	<transforms>
		<transform source="el" target="Lower" direction="forward" alias="el-t-d0-lower">
			<tRule>
# Special case for final form of sigma.
::NFD();
# C is preceded by a sequence consisting of a cased letter and then zero or more case-ignorable characters,
# and C is not followed by a sequence consisting of zero or more case-ignorable characters and then a cased letter.
# 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA
# With translit rules, easiest is to handle the negative condition first, mapping in that case to the regular sigma.
Σ } [:case-ignorable:]* [:cased:] → σ;
[:cased:] [:case-ignorable:]* { Σ → ς;
::Any-Lower;
::NFC();
			</tRule>
		</transform>
	</transforms>
</supplementalData>

Reply via email to