On Monday, October 14, 2013 2:40:14 AM UTC-5, Erik Dalén wrote:
>
> I checked this a bit further, and it seems like the policy is to always 
> have UTF-8 encoding in RPM descriptions etc, but that RPM will happily 
> build packages with text in other encodings.
>
>

Whose policy, exactly?  As far as I can tell, the RPM file format 
specifications do not define the character encoding to be used for textual 
data, therefore it is dangerous (dare I even say "wrong"?) for code that 
consumes that data to make any assumption whatever about its encoding.  
Absent a means to determine the correct encoding, the "strings" from RPM 
headers ought to be handled as byte arrays (since that's what they actually 
are).

 

> But it seems like a decent workaround to force encoding here as RPM seems 
> to print the original text out to the console without any charset 
> translation to the system locale.
>
>

Forcing UTF-8 would rescue only the case where the RPM text is encoded 
specifically in that encoding (including pure, 7-bit ASCII).  If the actual 
encoding were anything else, and it contained non-ASCII characters, then 
you would again get an encoding error.  If it is important to decode the 
bytes to characters then it would be better to assume Latin-1, which admits 
no invalid code sequences.

On the other hand, if it is essential to header text to the logical 
character sequence from which it was encoded then there is no substitute 
for a reliable method of determining the encoding.


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to