Re: Normalize an NSAttributedString

Ken Thomases Sat, 29 Aug 2009 14:31:29 -0700

On Aug 29, 2009, at 3:48 PM, Ross Carter wrote:

On Aug 29, 2009, at 1:22 PM, Ken Thomases wrote:
On Aug 29, 2009, at 11:46 AM, Ross Carter wrote:
Suppose an NSAttributedString comprises the string o + umlaut indecomposed form, plus one attribute. Its length is 2, and therange of an attribute is {0, 2}. The string and its attribute arearchived separately as xml data like this:
<string>ö</string>
<attrName>NSFontAttributeName</attrName>
<attrValue location='0', length='2'>Helvetica 12.0</attrValue>
If, during unarchiving, the string is represented by an NSStringobject in precomposed form, its length will be 1, and an attemptto apply the attribute range of {0, 2} will fail.
But why would it change between archiving and unarchiving?
Because during unarchiving, the NSString is created by NSXMLParserand I assume that there is no guarantee regarding the normalizationform of that string. NSXMLParser might decompose the string, forexample. It seems to me that to rely on NSXMLParser always toreturns strings in a particular form is to rely on an implementationdetail.

You can't rely on it to always return strings in a particular form.You should be able to rely on it to return strings in the form inwhich they were written.

Admittedly I have not observed any such funny business. I justassume it is possible.

I do not. If an XML library/framework were to fail to maintain theround-trip integrity of my data, I would consider that a bug.

Apple's NSXML documents (which, admittedly, don't quite apply toNSXMLParser) reference <http://www.w3.org/TR/xmlschema-2/>, whichdefines an XML string data type, with this definition:

The string datatype represents character strings in XML. The ·valuespace· of string is the set of finite-length sequences of characters(as defined in [XML 1.0 (Second Edition)]) that ·match· the Charproduction from [XML 1.0 (Second Edition)]. A character is an atomicunit of communication; it is not further specified except to notethat every character has a corresponding Universal Character Setcode point, which is an integer.

To me, this definition prohibits an XML parser from considering astring as anything other than a sequence of characters. That is, itcan't apply knowledge about Unicode canonical equivalence ordecomposition, etc. You put in a sequence of characters, you get outthat sequence of characters. (The schema also defines anormalizedString data type, but that uses a completely different senseof normalization than we're discussing.)


Regards,
Ken

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Normalize an NSAttributedString

Reply via email to